So, not fast, not good, and not cheap, when you consider the effort put into a custom, one-of-a-kind system. But, it keeps me in practice coding and designing. And, because it runs on Linux, I can keep the security patches current: many purchased plug-and-play “appliances” have their code burned in at time of manufacture, and may be designed around already obsolete and buggy software. My little system has undergone several major upgrades of the Debian Linux distribution core system (Linux kernel 4.9.35, patched 30 June 2017: latest release is 4.12) and gets regular security patches and bug fixes. That’s even newer than my primary laptop (Kernel 3. 13.0, patched 26 June 2017). Considering all the little Rasperry Pi machines scattered around the house, it may be prudent to work on configuring them for diskless boot, in order to preserve the flash memory chips on-board.
A while back, we wrote about rebuilding a crashed Raspberry Pi system. In the course of reinstalling the system (on a new chip–the old SD card that contains the operating system had “worn out”), we had made a fatal slip. This system happens to be our gateway system, i.e., connected directly to the Internet to provide us access to our files and some web services while out of the office. Unfortunately, this also provides the opportunity for the world-wide hacker community to try to break in.
Normally, we have safeguards in place, like restricting which network ports are open to the outside and which machines and accounts are allowed login access. However, in our haste to get the new system up and running as quickly as possible, we connected the device to the Internet to download upgrades before the configuration was complete, meaning the system was exposed without full protection for several hours to several days.
Now, our security logs record, on a normal day, hundreds of break-in attempts (see screenshot above). We aren’t the Democratic National Committee or Sony, just a small one-man semi-retired consulting business. But, the hackers use automation: they don’t just seek out high-value targets, they scan the entire Internet, looking for any machine that isn’t fully protected. If they can’t steal data or personal information, they will use your machine to hack other machines. If you use Microsoft Windows, you are undoubtedly familiar with all sorts of malware, as there are many tens of thousands of viruses, trojan horses, adware, ransomware, and other malevolent software that invades, corrupts, and otherwise takes over or cripples your machine. Unix systems are less susceptible to these common attacks, but, if an account can be compromised, or a bug in the login process exploited, eventually a persistent attacker can gain system privileges and install a ‘rootkit,’ a software package that replaces the common monitoring and logging software, redirecting calls through the rootkit, which hides its existence and activities from the reporting tools, even the directory listing utilities.
Once an attacker takes over a Unix or Linux machine, there is no limit to the damage they can do on the Internet, as Unix/Linux is the basis of most of the servers on the Internet, and can become as a SPAM server, web-spoofer, or hacker-bot itself. (Microsoft Windows Server has the rest, nearly half, and they are even easier to break into.) I began to suspect this might have happened when normal system functions failed to terminate or run correctly. We have a lot of custom software built on this machine, which runs on a scheduler. The machine got slower and slower, and it was apparent that the jobs run by the scheduler were never exiting, filling up the process table with jobs that weren’t doing anything, except taking up space, which is always at a premium. Clearing out all the jobs, restarting the machine, and starting the processes manually worked–until about 2:00pm. Very suspicious: a rootkit, once installed, can repair or re-install itself even if the administrator restores many of the co-opted command files by normal upgrades or by a conscious attempt to recover from the intrusion.
The main problem seemed to be with /bin/sh, the system shell, which is actually /bin/dash, a shared object. Cron, the scheduler, uses dash to run the jobs, where the normal user login shell uses /bin/bash, a non-linkable executable shell with similar functionality. A rootkit is generally constructed as a filter, wrapped around the co-opted commands, so it would be easier to link to the *real* /bin/dash in an undetectable manner from the filter program than it would to wrap /bin/bash. In this case, assuming an intrusion was the cause, something went wrong, rendering dash non-functional. Perhaps the intrusion was not compiled for the ARM processor used by Pi, though most of a rootkit would be scripted to be portable among different CPU architectures and Unix/Linux versions.
An analogy to the problem would be like finding out who let the horses out: it is easy to identify wolf or horse-thief tracks outside the barn when the door is barred, but, if you left it open and the horses have bolted, it is more difficult to find out what happened–the traces are covered. I did install some intrusion-detection software, but running it after the tracks are covered over is usual a futile effort. However, there were enough questionable traces to warrant taking corrective action. Besides, even if the problem had been caused by some inadvertent misconfiguration on my part (unlikely, considering the fact that the machine could be made to run for several hours before the problem reasserted itself), the solution was clear: reinstall everything.
The first step is to backup the data, including configurations. Now, this is not just an ordinary computer: Raspbian, the Debian-based operating system distribution designed for the Raspberry Pi computer, comes with a simple desktop intended to introduce new users to Linux. But, this machine doesn’t use the desktop and is not even connect to a monitor most of the time: it is an internet gateway, web server, and custom webcam driver, so has a lot of “extras,” both loaded from software repositories and written especially for this installation. Backups are important, since much of the software only exists on this machine. And, since we only have one camera, fail-over isn’t possible without physically moving the camera from one machine to another, not a trivial exercise, as the connection is on the motherboard rather than an external connector.
Now comes the glitch: Since the introduction of the Raspbian operating system, it has been based on Debian 7; but, since Debian 8 was recently released, a new version of Raspbian is also available. So, the machine was rebuilt with Raspbian “Jessie”, replacing Raspbian “Wheezy” (the releases named after Toy Story characters rather than just the numbers–as with Apple OS/X, Debian releases tend to have names in addition to release numbers). Installation on Raspberry Pi is not like other computers. Since there is no external boot device, the operating system “live” image is loaded onto the SD card that serves as the boot device and operating system storage. Initial configuration is best done without a network connection, since the startup password is preset and well-known.
So, avoiding that mistake (booting on a network with the default passwords, the single most preventable source of hacker intrusions), we booted with the network cable disconnected and a monitor and keyboard attached, changed the password and expanded the system to fill the SD card, set up the other user accounts, then shut down the system, removed the SD card, and mounted it on the backup server to finish transferring vital data, like the security keys and system security configurations. In larger systems with a permanent internal boot drive, such “hacker-proof” installation is done on an isolated network, but, since the boot drive on a Pi is removable, it is easy enough to edit the configuration files on another system.
So, with the system fairly well hardened by securing the system accounts and user accounts, it was rebooted attached to the network and the system upgrades and extra software packages (like the web server) installed. So far, so good. But, since we upgraded the operating system, the server packages were also upgraded, most notably moving from the webserver, Apache, version 2.2 to version 2.4. Apache has been the predominant web server software on the Internet for 20 years, so it is in a constant state of upgrade, for security and feature enhancements. Between version 2.2 and 2.4, many changes to the structure of the configuration files were made, so that not only did the site configuration need to be restored manually, but there was a fairly steep learning curve to identify the proper sequence and methodology by which to apply the changes.
Then, of course, were the additional Python modules needed to be installed to support the custom software, which involved downloading and compiling the latest versions of those, since Python 2 also upgraded from version 2.7.3 to 2.7.9 (we haven’t yet ported the applications to Python 3, which moved from version 3.2.3 to 3.4.2 between Debian 7 and Debian 8). Finally, there were other tweaks, like comparing system configuration files to update group memberships for access to the camera hardware, loading the camera drivers, and setting file ownership and permissions for data and program files.
We could have saved most of this by sticking with Raspbian Wheezy, but eventually, support for older systems goes away, and the newer systems are usually more robust and faster: open source software evolves rapidly, with new minor releases every six months and new major releases every two years for most distributions, and an average life span of five years for maintenance of major releases and a year for minor releases. As we said before, Linux is free, if your time is worth nothing. The price of keeping current is constant maintenance. Patch releases occur as they are available, with maintenance upgrades almost daily.
Finally, after a week of tweaking and fiddling, the webcam service is back up and running. And, the security logs show break-in attempts every few minutes, from multiple sites all over the world (one from Portugal recently, others from unassigned addresses–ones with assigned addresses undoubtedly come from computers that have been compromised and used as hacking robots, as hackers don’t want to be traced back to their own computers, ever).
So, the moral of this post is: don’t ever expose a stock, unmodified computer system directly on the Internet (which is difficult to do, when all upgrades, new software, etc is available only through download from the Internet–which should be accomplished only from behind a proven firewall). But, you can set passwords and change system accounts before joining a network. And, if your computer is hacked, take it to a professional, and don’t grumble about the cost or time it takes to restore it. Pay for malware scanning software and keep your subscription up to date, as well as scheduling upgrades on a regular bases. And, if you are a professional, don’t take shortcuts (i.e., install and configure off-net or behind a firewall), keep good backups, install intrusion-detection software early, and check for security upgrades daily. Change the default passwords immediately, and create a new, weirdly named administrative user, and deny external logins for all administrative users. Use two-factor authentication and public-key encryption on all authorized user accounts. They are out there, and they are coming for your computer, even if you don’t have data worth stealing: they can use your computer to spread SPAM or steal data from someone else.
The last Friday in July (today!) is the 16th annual system administrator appreciation day, an obscure celebration started in 2000 (by a system administrator, of course) as a response to an H-P ad showing users expressing gratitude to their sysadmin for installing the advertiser’s latest printer. To my knowledge, none of us have ever gotten flowers or even donuts on “our” day, but it does remind us in the profession that our job is to keep the users happy, mostly by keeping the machines happy, but also by attending to their needs in a prompt and professional manner.
I was reminded of the event not only by notices in the discussion forums and IT email lists, but by the fact that today, the replacement memory module for our network server came, and I installed it. A simple procedure, but one that takes a fair portion of the sysadmin’s bag of tricks and tools to accomplish. Bigger shops might have a service contract with the hardware vendor, but in many cases, the sysadmin is also the hardware mechanic.
For a few months, the server, a Dell T110, has been crashing every few weeks, fortunately not while we were on our two-month grand tour, but of concern, naturally. especially because it is a virtual machine host, and often has a half-dozen virtual machines running on it, which means, when the server goes down, half of our network goes with it. Virtualization is a great way to run different versions or distributions of operating systems when developing and testing software, so not too many have production roles in the network, but it is still an inconvenience to have to restart all of them in event of a crash.
A red light appeared on the front panel of the server, indicating an internal hardware condition, so it was time to check it out. First, hardware designed for use as servers (the T110 is aimed at small offices like mine) is a lot more robust than the average tower workstation you might have on your desk. Note above the heavy-duty CPU heat sink (air baffles have been removed for access to the memory modules–the four horizontal strips above the CPU fins). In addition, big computers have little computers inside that keep track of the status of the various components, like memory, CPU, fans, and disk drives, and turn on the light on the panel that indicates the machine needs service. Server memory has error-correction circuitry, as do most server-quality disk arrays, but this is limited to one error–the next one will bring the system down.
System administrators depend on these self-correcting circuits and error indications to schedule orderly shutdowns for maintenance, so that the machine doesn’t crash in the middle of the workday. For most offices, this means late evening or weekend work. For 24-hour operations, like web sites, it means shifting the load to one or more redundant systems while the ailing one is repaired, so no data is lost. Companies like Dell supply monitoring software to notify sysadmins of impending problems, which is vital to operations where there is a room full of noisy servers and the admins are in a nice quiet office in the back room. In our case, with just one server, we don’t use the monitoring software regularly, but it is useful for telling us which component the red light is for; then we can look up the location in the service manual and order the right part, and hope the system doesn’t crash before it arrives.
Normally, businesses that are thriving and need to keep competitive in the market replace their machines at least every three years. Others, like ours, that operate on a shoestring and buy whatever resources we need for a project when we need it, tend to run machines five years or more, sometimes until repair parts are no longer available: since we run Linux, we have machines eight years or older that are still useful for running some network services.
Our server is almost five years old, so when I order replacement parts, they don’t always look like the ones we took out, or have the same specifications. For that reason, I usually take replacement as an opportunity to upgrade, replacing all of a group of components with the a new set, which I did when a disk drive failed a couple years ago. However, this time, since I’m semi-retired and don’t have a steady cash flow, I only ordered one memory module, to replace the failing one. Memory comes in pairs, so having slightly different configurations in a pair causes the machine to complain on startup, but it still runs. The “upgrade” alternative would have replaced all four modules, or at least the two paired ones, with the larger size, at a cost of $150 to $300 instead of just replacing a $50 module and putting up with having to manually restart the machine on reboot.
So, the sysadmin not only needs to keep the machines running, but running within budget, and making sure the operating systems and hardware capabiities can support the software users need to do their jobs. If he or she is doing their job right, there won’t be any red lights in the server room, and the sysadmin will look like they aren’t doing anything…
It’s been a while since our dour post on the folly of Congress and the personal impact of political maneuvering. After a quick flurry to catch up projects before the end of October, which was also the end of a contract that we’ve been on continuously, through three changes in contract management, since the summer of 2001, and the last four years as an independent contractor. We did expect a follow-on contract from a new prime contractor: this is the way many government service contracts go–the management changes, but the workers get picked up by the new company, so there is continuity of service, and a bit of job security, despite lack of “brand loyalty.” During the 1990s, I seemed to change jobs about every 18 months on average, moving to a new client and a different employer, so keeping the same client is a bonus.
Unfortunately, the chaos resulting from the shutdown meant that the contract transition did not go smoothly. We departed on a scheduled vacation trip to Kauai with nothing settled, but the first time since 1989 that we’ve gone on vacation without the prospect of work interrupting. Nevertheless, midway through the week, the contract negotiations got underway in earnest, complete with five-hour timezone shift: when we returned, there was a flurry of activity and then, back to the grind, continuing on the ongoing projects that were interrupted by both the shutdown and the contract turnover.
While we were gone, the usual Pacific Northwest November storms came early, knocking out power to our network, so there was much to do to get things running again. Several machines needed a rather lengthy disk maintenance check, and the backup system was full, as usual. So, we took advantage of 1) the possibiliity of future earnings due to contract renewal and 2) delays in getting the paperwork actually signed and work started, to do some system maintenance and planning, starting with acquiring a bigger backup disk.
Secondly, our office Linux workstation had had a bad update, trying to install some experimental software from a questionable repository, to the extent that the video driver crashed and could not be restored. Upgrading the system to a newer distribution didn’t help, as upgrades depend on a working configuration. Now, we’ve been using Ubuntu as our primary desktop workstation environment since 2007, through several major upgrades, one of the longest tenures of distros ever (though we did use Solaris for a decade or more, along with various Linux versions). Ubuntu has one of the best repositories of useful software and easy updates and add-ons of a lot of things we use from day to day. But, in the last couple of years, the shift has been to a simpler interface and targeting an audience of mostly web-surfers who use computers for entertainment and communication, but little else. Consequently, the development and productivity support has suffered. The new interfaces on personal workstations, like Ubuntu’s Unity and Microsoft’s latest fiasco, Windows 8, have turned desktop computers and laptops into giant smart phones, without the phone (unless you have Skype installed). One of the other items to suffer was the ability to build system install disks from a DVD download.
So, after failing to restore the system with a newer version of Ubuntu, on which it is getting more and more difficult to configure the older, busy Gnome desktop model that we’ve been used to using for the last decade or more, we decided to reinstall from scratch. Ubuntu also somehow lost the ability to reliably create install disks, as we tried several times to create a bootable CD, DVD, or memory stick, to no avail. So, since we primarily use Red Hat or its free cousin, CentOS, as the basis for the workhorse science servers at work and to drive our own virtual machine host, I installed CentOS version 6 on the workstation. All is well, except CentOS (the Community ENTerprise Operating System) is really intended to be a server or engineering workstation, so it has been a slow process of installing the productivity software to do image editing for photos and movies and build up the other programming tools that are not quite so common, including a raft of custom modules. Since Red Hat and its spin-off evolve a bit more slowly than the six-month update cycle for Ubuntu, there has been some version regression and some things we’re used to using daily aren’t well-supported anymore as we get closer to the next major release from Red Hat. Since restoring all my data from backup took most of a day and night, and adding software on the fly as needed has been tedious, I’m a bit reluctant to go back. Besides, I need to integrate a physical desktop system with the cluster of virtual machines I’m building on our big server for an upcoming project, so there we are.
The main development/travel machine, a quad-core laptop with a powerful GPU and lots of RAM, is still running Ubuntu 12.04 (with Gnome grafted on as the desktop manager), but has had its own issues with overheating. So, this morning I opened it up for a general checkup. Everything seemed to be working in the fan department, but I did get a lot of dust out of the radiator on the liquid cooling system, and the machine has been running a lot cooler today.
Because of the power outage, and promises of more to come as the winter progresses, we’ve been looking at a more robust solution to our network services and incoming gateway: up until now, we’ve been using old desktop machines retired from workstation status and revamping them as firewalls and network information servers, which does extend their useful life, but at the expense of being power hungry and a bit unstable. But, the proliferation of tiny hobby computers has made the prospect of low-power appliances very doable. So, we are now in the process of configuring a clutch of Raspberry Pi computers, about the size of a deck of playing cards, into the core network services. These can run for days on the type of battery packs that keep the big servers up for 10 minutes to give them time to shut down, and, if they do lose power, they are “instant on” when the power comes back. And, they run either Linux or FreeBSD, so the transition is relatively painless. The new backup disk is running fine, and the old one will soon be re-purposed for archiving data or holding system images for building virtual machines, or extending the backups even further.
So it goes: the system remains a work in progress, but there does finally seem to be progress. We even have caught up enough to do some actual billable work, following three really lean months of travel and contract lapses.
Yesterday, we were hit with a thermal shutdown on the big laptop. Installing psensor and the coretemp module helped get a handle on the issue, which centered on the Nvidia GeForce 540M GPU. Hardware drivers have always been an issue for Linux, since the Open Source software model conflicts with the need for peripheral vendors to keep the internals of their hardware secret, which they do so by not releasing the source code for the software that links the hardware with the operating system. That’s fine for a closed, proprietary system like Microsoft Windows, the primary market. As Linux users, not in the business of redistributing systems, we would be happy with an add-on driver that works. But, since Linux is a small portion of the market, there is little incentive for hardware vendors to write Linux-specific driver software. And, the software that is available is not always optimized for the Linux kernel, with the result that it either is buggy or skimps on reliability features.
Nvidia has, in the past year, incurred the ire of Linux founder Linus Torvalds for just these issues. The Ubuntu Linux distribution that we run on our systems comes with a more or less generic Nvidia driver. While users can download an updated driver from Nvidia, installation is a bit daunting, requiring the system to be reconfigured for text-mode login in order to rebuild the X11 graphics links. Those of us who have been around long enough to remember hand-tuning the X-Window system configuration files, fingers poised over the “kill” keyboard sequence, ready to shut down X11 in an instant to avoid burning the monitor if the settings were wrong, have grave misgivings about tinkering with the graphics. Plus, the forums on the ‘Net seemed to show that users were having overheating problems no matter what combinations of driver and distribution versions were used. Custom configurations seem to be less than desirable for production machines, so we elected to look for another solution.
A bit more searching on the ‘Net came across the fwts (FirmWare Test Suite) utility package. Once installed, it ran, pointing out compatibility issues between the computer BIOS and the kernel/driver configuration. One of the automatic corrective actions performed by fwts was to switch the operating mode from “performance” to “normal,” which immediately lowered the operating temperature of all the components. The GPU still shows a temperature increase under load, but the fan hardly runs anymore, whereas earlier in the week it was running on high most of the time.
The take-away message here is, updating kernels and/or drivers can and will sometimes result in conflicts with your hardware. Linux has come a long way toward a plug-and-play, run “out of the box.” installation, but it still pays to test and evaluate hardware configurations, just like the old days of Unix. Actually, in the “bad old days” of a few commercial Unix systems, the range of hardware combinations was often very limited, so compatibility issues had been carefully tuned out by the system vendor. But, those systems were expensive. In the Open Source world of Linux, where the system is expected to run on any combination of hardware on the commodity PC market, some outliers are to be expected. For the average desktop Linux user, converting an old Windows machine to Linux will work just fine. But, for “power users” and server applications, some engineering and testing may be required for optimum performance. Certainly, the psensor and fwts software will be an important part of the Linux toolkit from now on.