Too Busy To Post
Posted on April 5, 2017
Lately I've been seemingly too busy to post updates to this blog. Work has been a bit hectic with the upcoming introduction of a cross-platform plotting library for Simply Fortran and the implementation of automated builds (finally !) for saidproduct. Additionally, home improvements have gotten in the way. ButI've had some time...
Fixing Open Watcom POSIX Threads
Last year I submitted a major pull request to Open Watcom that at least nominally introduced POSIX threads to the Linux runtime library. This improvement was a long time coming, and it allowed compiling still more Linux software with that aging Open Watcom toolset. I originally undertook this task to get Python on Linux to support multithreading as part of the Lightning Python project. While nominal testing showed that the threading implementation did work, the Python interpreter did not.
Additionally, I received word from the Open Watcom maintainer that I had missed a good portion of work, namely thread local storage, that the Open Watcom runtime relies upon. While I did implement pthread TLS in the form of keys, I did not support it for Open Watcom's internal threading, upon which the pthread implementation was built. I happily said I would add it, but I quickly ran into big issues.
I had a single pthread test program that would lock up or fail fabulously on Open Watcom, and I had planned to use it for testing that the runtime library's TLS was implemented at least nominally. That test program revealed so many problems with my exceptionally naive pthread implementation. Initially, the POSIX semaphores that are used for all manner of locking wouldn't work properly because I had been quite sloppy with Linux futex system calls. Fixing that problem took weeks, but so many other locking mechanisms were just broken and overly complicated. Mutexes were hitting internal deadlocks, conditions were failing to broadcast, and thread structure bookkeeping was triggering segmentation faults.
Finally, after fixing all of the above through the addition of more atomic x86 assembly calls and adding some "slop" to futex handling, I could pass my test program. Additionally, the Python interpreter, version 3.5.2 at least, would start with threading enabled. The pull request was accepted March 11, probably four months after problems with the initial implementation was revealed.
Try it out yourself by grabbing an Open Watcom binary from the Open Watcom Buildbot that delivers nightly builds of the entire toolchain.
Hardware and Software Troubleshooting
I have been spending plenty of time on the Rainbow itself. I was running into some hardware issues with this server over the last few months. Basically, it would lock up, and power cycling didn't fix the issue. Taking the machine apart and performing some debugging led to its working again, but I hadn't actually done anything.
The third time the above happened, I decided the culprit had to be the power supply. I originally thought it could be a memory board issue, but I wasn't sure. Replacing the Rainbow's power supply with a spare seems to have solved stability issues.
The software problem stems from the ridiculous chain of machines for serving this blog. Basically, though, it seems that the VPN between a Raspberry Pi, which is directly connected to the Rainbow, and a virtual private server would randomly fail and not properly restart. I had implemented a cron job to restart the VPN, but, checking today, the Rainbow hadn't had a connection since last week !While I don't think this blog is particularly important, the hackersseem to (lots of Wordpress-related requests), so I found that a bitodd.
As of today, I think the cron job should now be working. Basically, cron will bring down and reset the VPN once an hour, which will limit offline time to an hour at most in the future. The modern software is the problem here; the Rainbow's TCP/IP stack has been solid for about a month now.