Linux System Programming

Cool Tidbits from “Linux System Programming”

Recently I’ve undertaken a period of deep study related to Linux in all its aspects, but especially for embedded programming. While I’ve used Linux off and on professionally and privately for at least 18 years, my knowledge has always been just enough to get by.

Starting in about 1999 I’ve set up and managed Linux firewall / routers for my home and office use, including DNS, email, email list, and web servers.  I continue this to this day.

In 2005, I created a C++ cross-platform framework for using sensors and motion controllers on mobile robots, which I got to work under Linux and Windows with various built-in or external I2C or serial port devices as the attachment points.

From 2013-2017, my team at EM used various off-the-shelf single board computers such as the Pandaboard or Dragonboard as test platforms.  We ran the Dragonboard with Debian and wrote a lot of utilities and test software for the ASIC we were developing on it.  While I didn’t write the core C framework, I did need to, at times, troubleshoot bugs deep within it.

But programming-wise, I have not otherwise had the need to do much beyond simple POSIX-compliant C programs under Linux, and some occasional PHP, Python, and Java work. This was primarily as a result of depending on Windows for office and embedded toolchain support.  In my career I’ve spent a lot of time deep into smaller embedded systems using PIC, AVR, ARM, ARC, and MSP430 processors, with or without an RTOS, which kept me away from digging deeper into Linux.

The first Linux book I finished reading is the O’Reilly book, Linux System Programming by Robert Love. Overall, I found it well written and easy to read.  I like that, even though it is a Linux book, he points out which of the various POSIX, System V, BSD, and Linux APIs are portable or not, and which are best avoided.

There were no major surprises in the content for me, as I was familiar with most of the concepts, but certainly there were interesting APIs and command line tools as well as some higher-level concepts which I was not aware of. Here is a list of some things that stood out.

  1. system call interface: on the i386, user code utilizes registers ebx, ecx, and so on to pass parameters, then invoke int 80 to cause a trap into the kernel — very similar to the now ancient DOS int 21 scheme
  2. standards: of the history of POSIX (Portable Operating System Interface), SUS (Single UNIX Specification), and LSB (Linux Standard Base), I remember reading about the Unix Wars, OSF, and X/Open, but didn’t recall the merger forming the Open Group, which then released SUS
  3. inodes and how hard links work: two or more directory entries point to the same inode; a link count is maintained to ensure the contents are not removed until all hard links are; cannot span filesystems; the stat() family of functions returns a stat structure containing the inode number and hard link count (among other useful tidbits I was already familiar with)
  4. symbolic links: essentially a regular file containing the path name of the linked-to file, which can be on any filesystem
  5. processes: process id 1 is always the init process, while process id 0 is the idle process
  6. process groups: represent a parent process and its children, such as happens when a shell starts up a pipeline (e.g., ls | less), and provides a way then to send signals to or get info from an entire pipeline or all children thereof
  7. forking: if a parent process terminates before its child, the child is reparented to the init process
  8. zombies: a terminated process is a zombie until it has been waited on; the init process will clean up the zombies as it becomes their parent (if the parent terminated first) and thus can wait on them; this zombie state is necessary so parent processes can obtain information about why a child terminated, such as its return value
  9. waiting on processes: just like with many of these topics, there are many different functions providing varying levels of control for waiting on a process and obtaining info from it: wait(), waitpid(), waitid(), wait3(), and wait4(); the latter two provide lots of resource usage statistics such as memory use, page faults / swaps, block I/O operations, messages sent/received, signals received, and context switch counts
  10. open(): O_ASYNC requests that a signal be sent when the FIFO, pipe, socket, or terminal becomes readable or writeable; O_DIRECT requests direct I/O; O_NONBLOCK requests non-blocking I/O (usually for FIFOs); O_SYNC requests synchronous writes
  11. pread() and pwrite(): positional equivalents of lseek() + read() or write(), which ignore the file position and leave it alone
  12. multiplexed I/O: select(), pselect(), poll(), ppoll(), and epoll() all allow a process to block on an array of file descriptors until one of them is ready to be read or written, with varying control over how signals are handled; epoll() appears to be the superior function to use
  13. buffered I/O: besides the familiar fopen(), fread(), etc., there are _unlocked() equivalents which are unsafe but give a sizable performance improvement compared to the standard locking functions
  14. scatter/gather I/O: reads or write contiguously from or to a file using one or more segments of memory, where each segment can reside at a different location and have a different size; interestingly, this does not provide a way to modify the position within the file between segments, so to me, it is not truly scatter gather (perhaps my opinion is colored by the old SCSI bus scatter/gather concept, which kind of does that); functions are readv() and writev()
  15. memory mapped files: while I’m familiar with the concept, the details of what can be done on Linux are interesting; for example, you have control over protection (read, write, exec), you can make it private or shared (with other processes that open the same file), and memory must be aligned to MMU page size boundaries and be of multiples of a page in size; you can give the kernel hints about how it will be used so it can optimize its read ahead strategy using madvise(); mmap() is the main function involved;
    by setting the MAP_ANONYMOUS flag, one can create a mapping not backed by a file; further, if NULL is passed as the starting address, the kernel allocates pages with copy-on-write mapped to an already zeroed page, so the mapped memory returned will already be cleared
  16. normal file I/O: a similar posix_fadvise() allows you to help the kernel optimize read ahead for an unmapped file too
  17. copy-on-write: this MMU optimization strategy prevents wasting time copying memory from a parent to a forked child unless the child modifies it; if never written, they share the same pages of physical RAM; but if written, new pages are allocated, the original contents are copied there, and then the process can write cleanly and uniquely there
  18. user and group ids: a process actually has four user IDs and four group IDs associated with it — real, effective, saved, and filesystem; and there are APIs to read and modify them (though root privileges are needed for many modifications, as one would expect)
  19. sessions and session leaders: this is associated with a login shell and a controlling terminal; a session is a collection of one or more process groups
  20. daemons: a daemon is simply a session-less process running as a child of init; this can be done either by calling fork(); exiting from the parent; from the child, call setsid() to set a new process group and session; and clean up various file descriptors (such as 0, 1, 2 = stdin, stdout, stderr); or, the process can simply call daemon()
  21. processor affinity: this provides the ability to control which processor in a multicore system the process will run in; this is known as hard affinity, and can help when a given process is very sensitive to the CPU cache; if not set, the kernel uses soft affinity, which tries to keep a process running on the same CPU each time its timeslice occurs, but this is not guaranteed; calls include sched_setaffinity() and sched_getaffinity()
  22. real time support: besides setting the nice() value or the process priority, the scheduling policy can be set to either FIFO, Round Robin, or Other (default); FIFO and Round Robin help ensure that response latency for a process that handles an external signal is predictable; calls include sched_setscheduler() and sched_getscheduler(); pro tip from the book — “while developing a real time process, keep a terminal open, running as a real-time process with a higher priority than the process in development” — this ensures you can kill your process if it runs amok; the util-linux package of tools includes the chrt utility  helps you set real-time attributes on other processes
  23. extended file attributes: while I was familiar with EAs in NTFS on Windows, POSIX (and Linux) provides a relatively generic file-system-agnostic mechanism to associate key-value pairs with files (though not all Linux filesystems support this); often this information is stored in unused portions of a file’s inode; namespaces are provided for system, security, trusted, and user; functions include removexattr(), setxattr() and getxattr() to set and get a specific key’s value, as well as listxattr() to get a list of all keys; it makes me wonder what information is commonly hidden there!
  24. special device nodes: besides the commonly known /dev/null, there are also /dev/zero which, when read, returns an infinite stream of zeros or accepts and discards writes as /dev/null does, and /dev/full, which reads like /dev/zero but writes fail immediately with ENOSPC; these are useful for testing purposes
  25. monitoring file events: as Windows does, Linux provides a mechanism to watch changes of various kinds to specified file or directory paths; a single notifier can handle multiple files, and behaves like a file, so reading notifications is done using a normal read() call, and the file descriptor can be waited on with any of the multiplexed I/O mechanisms; functions include inotify_init1(), inotify_add_watch(), inotify_rc_watch(), and close(); there are a lot of options for controlling what events you are interested in watching
  26. memory locking: high performance programs can benefit by locking important regions of their memory against swapping by the MMU, using mlock() to lock a specific range of addresses or mlockall() to lock a process’s entire address space in physical memory; there are of course equivalent functions for unlocking — munlock() and munlockall()
  27. signals: the book touches on some important weaknesses regarding signals in Linux which must be understood to avoid serious problems when using them; for example, a process could be executing anywhere, including in a system call, so signal handlers need to stick to reentrant, signal-safe library functions; a process that needs to manage multiple signals can combine them in a signal set using functions like sigaddset(), sigismember(), etc.; sigprocmask() can block specific signals to protect critical regions; in addition to the simple signal() function, sigaction() provides a much more powerful way to handle signals, including the ability to block specific signals while inside your signal handlers, and gives the handler a lot of information about what was going on when the signal occurred; sigqueue() provides a way to send a payload together with a signal, which sigaction()‘s SA_SIGINFO type handler is passed when it is called
  28. time: besides the familiar time_t, struct tm, and time() functions, Linux supports 5 different POSIX clocks, which include CLOCK_REALTIME (the normal system time), CLOCK_MONOTONIC (won’t go backwards during leap seconds, for example), CLOCK_PROCESS_CPUTIME_ID, and CLOCK_THREAD_CPUTIME_ID, the latter two of which give access to the x86 high resolution CPU registers; clock_getres() tells you what resolution the specified clock has, and clock_gettime() obtains time from that clock; clock_nanosleep() lets you sleep using one of these clocks for relative or absolute times, and returns the amount of time remaining if the sleep was interrupted by a signal; rather than sleeping, timers can be set up; of note are the advanced timers using timer_create(), timer_settime(), and timer_delete(); such timers can use any of the POSIX clocks, can either send a signal or spawn a thread to execute the specified handler function, and can return to you the amount of time a timer might have overrun with timer_getoverrun()

Clearly a lot of powerful features are available in Linux beyond the normal libc functions I’ve used for years.  These will be very useful as I embark on digging deeper into embedded Linux.

After reading this book, I now want to revisit that robot framework I mentioned and take advantage of the many new (2.6.x kernel and later) Linux mechanisms described above.

Next up: the excellent Packt Publishing book Mastering Embedded Linux Programming by Chris Simmonds.



The Rust Programming Language: Saving Us From Digital Oblivion?

I’ve programmed embedded systems for most of the last 40 years. And of those 40, 30 were using C. (Oh My God…) C is like an old friend to me… but I’m growing restless and uneasy. I’m sorry C, but we’re growing apart. “What have you done for me lately?”

Restless? Because that’s a long time to do anything in one language professionally. Sure, I learned C++ when it first came out, and have used it off and on for various projects. I learned Java when it came out, and again have used it off and on… and then JavaScript.. Python… PHP (shudder)… a tiny bit of Objective C (egads, what a mess)… and just enough C# to muddle by on a contract.

But none of these languages can replace C in the kinds of embedded environments I make a living in: close to the metal, with no OS or a simple one. Sure, you can avoid the bad parts of C++ and use it, but to work well and reliably in an embedded system, you need to throw a lot of it out, and so what you’re left with are classes (which can be done using C structures and function pointers), and default parameter values (which can be helpful). OK, maybe a few other minor improvements. But it won’t be more reliable.

Uneasy? Because it’s REALLY HARD to write bug-free, secure, reliable C code. So many embedded systems are hackable or fail in the field due to coding errors. So many more take much longer to get to market than they should. And even more simply are flaky and unsatisfying to users and customers. How much of the backbone of this society depends on devices whose firmware is written in C? A frightening percentage are.

Sure, we can add more discipline to reduce the problem: better training, static code analysis, code reviews, coding standards, unit tests… but what if the language reduced this burden? What if it helped, rather than was part of the problem?

From what I’ve read, Rust could be an answer; there is a growing community adapting it to embedded systems. So, I’m currently reading Rust Essentials, and will play around with the Teensy port as a starting point.

I have hope that Rust could be a way out for the embedded world to improve productivity, reliability, and security, and usher in the use of modern language constructs in the embedded world. We shall see.

Bathroom Space Heater Repair

Our upstairs bathroom has no heat — no ducts run there — so we have a small electric wall-mounted space heater for cold winter months. We already replaced it once, as the Cadet Company in Washington State issued a recall on it. But that was 20 years ago.

In general it was still working fine, but during the most recent winter it started to come on by itself in the middle of the night, first on colder nights, and as the winter went on, even when it was not so cold. I became concerned that this thing would end up on continuously when we were out of town, overheat, and burn the place down.

Note that the heater does not really have an off switch per se. All it has is a temperature dial, and up until now, turning it to 0 was basically the same as off. But not anymore.

At first, I thought about just replacing it. I spent a lot of time searching around and so on, but couldn’t find one that I liked better. Giving up for now, I thought I’d pull the thing out and see if it was just some dirty contacts on the thermostat or something.

So, I turned off the breaker, pulled it from the wall, and expected a filthy dust-ridden nightmare.

Turns out, not so much. There was some dust, but the airflow must keep it pretty clean. I cleaned up what I could, then moved to the issue at hand.

First, how does this work? The only electrical parts in this thing are:

  • heating coil
  • fan
  • temperature control
  • over-temperature thermal fuse

The fan still runs. The coil still heats. The over-temperature thermal fuse is a one time, non-resettable device that blows when a severe over-temperature condition exists. Since the heater does still make heat, I figure it must still be OK.

The temperature control looks pretty simple:

I tried to get a picture from the side, to show how it works. This is not that picture, alas:

Basically, this thing is a micro-switch, a knob, a bimetallic strip, and a set screw. When the temperature goes down, the strip bends and pushes on the switch, turning on the heater. The set point is determined overall by the knob — it deflects in or out the strip.

But the key to my problem is the set screw, it turns out. This puts a minimum mechanical bias on the strip, so it either is on constantly, or, as you turn it further, requires the knob to be further and further from the off position before the heater will come on.

In the end, I could have fixed this without removing the unit from the wall. Oh well. At least now I know!

Bosch Dishwasher Repair

Unfortunately I didn’t take any photos… all I have left of the repair are fading memories, an email for the replacement part in January 2015, and a working dishwasher.

One day our 20 year old dishwasher ran dry. Literally. No water was entering, though it pretended to go through the motions.

I figured there must be some inlet valve that broke. Time to take a look.

I removed the one remaining loose screw holding the washer into the cabinet, and slid it forward. Clearly, all the good stuff is on the back and bottom sides, so I had to pull it completely out, and actually flip it over on its top.

I traced the route of water into the machine and found the inlet valve. I unplugged the spade plugs going to it, disconnected the water inlet and outlet connections, and took it down to the workbench.

I tried applying 120v AC to the terminals, and nothing happened — no clicking, nada. So, time to get a replacement.

Unfortunately, my usual go-to site for replacement parts,, did not have it. But, I found it on Sears Parts Direct.

After ordering the part and waiting a few days, it was easy enough to connect it in place of the old one, flip the dishwasher back over, and slide it into the cabinet. I took this as an opportunity to adjust the leg height and install two new mounting screws, so it’s no longer a bit wobbly. A quick test showed that the unit worked again. Hurray for our team!

Zenn Instrument Cluster Repair

I have a 2007 Zenn NEV (neighborhood electric vehicle) by Feel Good Cars / ZENN Motor Company. This small car was one of the very first mass-produced electric vehicles available in the US, that was not truly a suped-up golf cart. However, having shipped in low volumes and eventually going out of business, the car is not without some major flaws. One of the most common to fail, and expensive to repair, is the speedometer / odometer / battery level display, what Zenn calls the Instrument Cluster. When mine started to fail, it at first seemed to be related to moisture. My car has another common flaw, a leaky roof due to a failure of the glue that holds down the roof panel. After using a dehumidifier to dry out the car, I noticed the display started working again. However, this cure did not last for long, and later attempts to fix the display failures did not succeed.

Come the fall of 2014, I decided to finally sell the car. But what to do? The display didn’t work, and a new one would be $800. I had considered trying to take the module out of the car to troubleshoot it in the past, but it was clear it was a major undertaking. First, one was instructed in the service manual to remove the dash. Yikes! So that held me back for a bit. Eventually I decided to go for it. It turns out, one does not need to completely remove the dash. I did the following to remove the module (this goes with the instructions on pages 32-33 of the service manual):

  1. did not remove the steering wheel (really did not want to do that!)
  2. removed the side view mirrors, as the panels inside the pillars block the dash board from being lifted away
  3. removed the panel where the heating controls are, but I don’t think this was necessary
  4. did not disconnect the speakers (the instructions say to remove them, but since I was trying to keep it simple and not remove the whole dash, I left them in)
  5. removed the lower black plastic retaining rivet from the instrument panel (the lower part, by the light)
  6. removed the 2 left and right black plastic retaining rivets from the instrument panel
  7. removed the 2 left and right square head screws below the HVAC vents
  8. did not disconnect the heating throttle switch on the housing diffuser (looked difficult and unnecessary)
  9. removed the screws that hold the instrument cluster in
  10. started pulling the dashboard loose, just enough on the passenger side to get the instrument cluster out (I had to disconnect the passenger side fan duct)
  11. (the hard part)reached in through the hole in the dash where the instrument cluster mounts, around the back side of the now-loose instrument cluster, and disconnected the two cables
  12. pulled out the instrument cluster through the hole (by loosening the dash board, there is enough room to maneuver the instrument cluster to get it out)

Dash loosened, cluster removed

Next, I took the cluster to my workbench, and inspected the circuit board very thoroughly.  Nothing appeared the slightest bit burned or damaged; none of the components or copper traces were dark, warped, or scorched.  Time to power it up.

Using the wiring diagrams in the service manual, I figured out how.  I luckily had on hand a number of unused pins for similar Molex-style connectors, and stuck 4 in the proper holes in connector J4.  In the photo below, you can see I wrote with a sharpie the pin numbers on the end with pins 22 and 11.  Pin 22 needs to be connected to ground, while pins 9, 10, and 11 need to be powered with +12volts DC.  I connected 9, 10, and 11 together with solder, but wire would work just as well.

Applying power to the cluster

The display now powered up, but, as expected, had random segments dark while others were powered.  It’s hard to make out the speed or distance if all of them are not on when they are supposed to be!  Poking around at the board, I discovered the flex cable that connects the display panel to the circuit board was the culprit.  Pressing it down against the glass caused some of the segments to come on.  Pressing down the whole width of the cable against the glass made the whole display appear to be correct.

Unfortunately, such a flex cable connection is hard to repair.  They are somehow soldered and glued at the factory, but not something I felt comfortable fixing.  I did try adding a bit of glue between the cable and the glass, but this was ineffective.

I decided the solution was to make a clamp out of brass that could just stay there forever, keeping pressure on the flex cable.  I put some tape between the brass and the flex cable so it didn’t damage the cable.  I used a strip of brass from the hardware store that was as wide as the flex cable, and cut and bent it to reach around the side of the instrument cluster and press against the circuit board connector on the opposite side.  It’s kind of a squared-off U shape.

Brass clamp holding flex cable against the glass display

Here’s a close up of the side pressing against the glass.  Unfortunately I didn’t take any pictures of what’s underneath — the flex cable glued to the glass.

Brass clamp and tape pressing against flex cable and glass

Here’s the other side, where the clamp grabs on.  It’s important to shape the brass such that it cannot possibly short out the flex cable connector pins.

Brass clamp grabbing on to the connector

Here’s another view.

Another view

Here’s proof the repair worked.  When you first power up the cluster, it tests the display by turning on all segments.  I captured this photo, showing all the critical segments are now lit.  There might be a couple on the lower left number that are still not fixed, but I’m not even sure what that section of the display is for.

All important segments are now lit

After a second or two, the normal display appears.

Normal display

A nice aspect of this repair is that, once the back cover is screwed on the instrument cluster, it just touches the brass clamp, ensuring it can’t come loose.

Rear cover attached

All that was left at this point was mounting it back in the dash and reassembling everything.

The hardest part of this was reaching in the hole on the dash and getting the cables reconnected.  Reattaching the passenger side HVAC vent was also a bit difficult, but doable.

Cables inside dash

Nice to have the display working again.  Too bad I got too busy at work after I did this repair in November to sell the car… hopefully I will do so soon.



A happy Nest is a warm nest

Nest Smart Thermostat Troubles

A happy Nest is a warm nest

A happy Nest is a warm nest

This Christmas I received a Nest thermostat — pretty awesome step up in technology compared to the 1994 era Honeywell thermostat we were still using!

Ancient Honeywell thermostat

Ancient Honeywell thermostat

The instructions were very straightforward. Our old thermostat used 3 wires from the furnace — G, W, and R. G controls the fan, W controls the gas valve, and R is power. Missing, however, was a C or common connection — though the Nest is supposed to support systems that don’t have one.

All seemed to be well the first evening and night. My son and I configured it to connect to our wireless network, set the Nest’s various settings for heating type and so on, and installed apps on our phones and iPad.

The next morning, the heat came on for a few minutes, and then mysteriously stopped. I heard it try to turn on shortly thereafter, which resulted in a disconcerting series of repeating aborted heating cycles of a few seconds each (whoosh! click! whoosh! click! whoosh! click!). Time to investigate!

Searching their support forums turned up a likely culprit — the lack of a C connection.

Unlike the old Honeywell, which ran from a few AA cells for a few years at a time, the Nest has a non-replaceable internal battery, which it needs to keep charged somehow. If your thermostat wiring provides R and C (which basically tie to two sides of a 24 volt AC transformer in the furnace), then it uses those to keep the battery charged. Without C, the Nest has clever circuitry which is supposed to fake one using G and W as current return paths. Unfortunately, this bit of cleverness failed in my case, and the Nest did not work well.

Luckily, being pack-rats, we never throw anything away, including the manual for the furnace. A cursory glance turned up no C connection per se. However, after studying it for a while, it became clear that chassis ground and Earth ground were both tied to one side of the 24 volt transformer, and the R wire went to the other side. Bingo!

Furnace Schematic

Schematic for our old furnace

I turned off furnace power and took off the covers of the furnace, then traced the wires. This unveiled a ready-to-be-used blue wire running to chassis ground, right by the other wires connected to the thermostat cable. Luckily, that cable had a few unused wires, so I picked the brown one to run the new C line. I then connected it to the C terminal on the Nest, closed up the furnace, and turned furnace power back on. A quick inspection of the Nest wiring screen confirmed that the Nest automatically detected the new C line.

Inside the furnace

Inside the furnace

I tested the furnace a bunch of times that day, and the problem was solved. Phew.

GCC Mayhem

I tend to use gcc and gcc-derived tools quite often in my work; examples include the Arduino environment, LPCXpresso, and WinAVR. While not the best compiler in the world performance-wise, it wins hands down when you factor in its portability, ubiquity, and stability — as well as cost.

Sometimes weird stuff happens though. Recently I upgraded my main desktop machine from Windows XP to Windows 7. Since then, I have been unable to run any of the above toolchains successfully. What gives?

Turns out the problem was caused by a recent installation of a gcc toolchain, Ride7, from Raisonance, for the EM Micro 6819 processor family. That’s right, one gcc toolchain killed 3 others!

The solution (at least to make the first 3 work again) was to remove the GCC_EXEC_PREFIX environment variable that Ride7 created. Turns out that is a variable used in older gcc implementations that now causes havoc with newer ones. And Ride7′s gcc is OLD! 2.80! Current gcc versions are in the 4.3 range or higher.

Despite my initial fear that the trouble was Windows 7 itself, it turned out to be a combination of things — I never tried those other tools again after installing Ride7.

Things I’d Like to Write About

In no particular order:

  • All about Pleo; design, compromises, missed opportunities
  • Experiences at previous employers
  • What it’s like to be a consultant
  • Favorite programming tools
  • Sound localization experiments
  • Motion control with low resolution feedback
  • Sensor processing
  • Things I’ve built with the kids
  • Things I’ve built on my own
  • Automatic interface switching techniques (Unicoder serial vs. I2C on same control lines)
  • Technical books that I’ve gotten a lot out of
  • Experiences coaching FLL (Lego Robotics) teams
  • People that inspired me growing up
  • Lessons learned in making my own products (design, manufacturing, sales, distribution, marketing)
  • Debugging techniques