While I am definitely a hardware guy, I also really like to write software, either as the firmware for the devices I build or as quick hacks to improve my daily routines. I’ve always been a Windows guy because all electronics development software was only available for that platform until recently, and I have a love/hate relationship with Linux: I tried many times to use it as my main OS, but I always had to revert to windows for one reason or another (but mostly laziness, I must admit).
I had the chance to “contribute” to the linux kernel development two times. The first time, about 6 years ago, I couldn’t set a custom baudrate on a PL2303 USB-serial adapter. I found the problem in the devices kernel driver: they were basically comparing the requested baudrate with a fixed list of baudrates, and setting it to the closest if it’s not among the predefined values. Then they proceed calculating and setting the clock divisors with a generic formula that should work with “any rate between 75 bps to 12M bps”. That is pretty stupid, as it doesn’t solve anything; if a users selects a wrong baudrate by mistake, they wouldn’t be able to communicate either if we calculate the divisors with the formula or if we force some other baudrate on them. So why bother with a table, at least try to calculate the values and see what happens! That seemed pretty straightforward to me, so I naively tried to politely explain the problem to the driver maintainer. In the followup discussion the stubborn maintainer repeatedly refused to consider the proposal saying that the driver worked that way for the past 10 years; he probably didn’t even look too much into the issue in the first place, or he would have noticed the problem. At that point I was very frustrated and gave up to go back to windows, but luckily Reinhard Max picked up my fight and finally convinced the maintainer that that was the way to go, resulting in this patch! The mailing list discussion is quite interesting, and I learnt a lot about the correct methods to report issues:
- clearly describe the problem and the software versions you’re running
- compile and test any proposed patch (big problem if you’re a novice and have to learn how to compile the kernel from scratch)
- have A LOT of patience to deal with maintainers, that to be fair are probably overwhelmed by requests
That helped me when I found an issue in the DS1307 RTC driver while developing Prism. In the first version of my linux board with that IC, I mistakenly connected the RTC supply to 3.3V instead of 5V. After enabling the RTC in the device tree and recompiling the image, I got an infinite loop at boot that completely locked me out of the system, being it a single-core CPU. The driver kept on printing “SET TIME!” on the serial port, so I looked for that string on all the files in my build folder. And there it was, at line 1728 of rtc-ds1307.c! With a nice “goto” on the next line, pointing at a label 23 lines above, generating an infinite loop if a bit is set.
This time I tried my best to make a good contribution, explaining the problem, making a patch and testing it (I learnt how to compile kernels in the meantime, also thanks to OpenWRTs great image builder). I basically checked the value of the problematic register, compared it to the 0xFF I was getting (impossible by the datasheet, and plausible in case of IC failure) and exited the loop gracefully.
And the maintainer basically put the blame on me because I should have used other unspecified methods to read time (I used the default stuff from OpenWrt) that wouldn’t cause a boot lock despite infinite loops.
I politely replied again trying to explain the problem, and that a driver shouldn’t cause an infinite loop anyway, and he actually listened to me, admitting that he was too quick to dismiss the issue! That made me super proud, but he didn’t actually accept my patch and wrote a way better solution to fix the problem.
What’s the lesson here? I was pretty disappointed by finding these kinds of bugs in Linux, I always imagined it to be better than that. It’s running on trillions of computers, isn’t it? But it’s made by humans that most of the times don’t get paid for the work they do, and depending on their expertise their code varies in quality, right? Well, I was reporting these bugs that I found during my “job”, and the solutions I offered focused on solving my specific problem as efficiently as possible and weren’t designed to fit the big picture: instead of spending days refactoring the code to fix the “goto” bad practice altogether (while possibly introducing more bugs), I just added a check for a specific type of failure and called it a day.
So I guess it’s quite the opposite, a lot of the problems in Linux are due to (mostly paid) people pushing stuff to fix shit quickly and cheaply, and the good code is written by passionate persons taking the time to understand how every piece fits together… And anything in between. Also, a lot of time and money is invested in writing great code and maintaining the vital and most used parts, it’s in the edges of the forest, the less explored areas like less common hardware drivers, where you find the most wild hacks. I’ll do better next time, I promise!
I still have a lot to learn about Linux, but I was very lucky to meet a lot of great friends at hacker camps that can help me fill the gaps; among them one of the best developer I know, so good and expert in privacy that doesn’t even want to be named (LOL), that is helping me with the development of Prism. Now I can sleep better knowing that the Linux part will be handled by a real Pro :)