Tags
The infamous Windows Blue Screen of Death
My first job out of college in the high tech industry was doing front-line phone support for a startup software company. The software ran on a Windows box, which back in those days were pretty rudimentary. I mean, a 100MB hard drive was considered “lots of space”. It was a long time ago.
I got in big trouble with my boss’ boss, the CEO of the company (and the guy whose arrogance, incidentally, drove the company into the ground rather than sell it to a big Japanese firm and make us all a bunch of money rather than the company folding), after one frustrating phone call with a customer. This was well before phone support people had “scripts”; we had a written list of common problems people faced, and after that we were required to solve their issues as best we could. And I was fresh out of school.
After spending a good half-hour trying to fix this guy’s problem, I finally had him reboot the system and the problem cleared up. When he demanded I tell him what might have gone wrong, I told him that sometimes, hardware just has the equivalent of brain farts, and you have to reboot to clear them. He wasn’t happy with this answer and called the CEO.
Here’s the thing, though: I’ve learned since then that computer software and hardware does sometimes just fail for no discernible reason. Sometimes it has a brain fart, and you have to reboot it.
It’s a fact of life. And why? Because the entire edifice of hardware and software is a precariously-balanced inverted pyramid that is only kept working by the tireless attention of millions of people. Let me explain:
There’s a reason folks didn’t build ’em this way
Down at the bottom, there? The point on which all computer systems rest is a little switch that either turns on or off. I know you’ve almost certainly heard it before, but it’s important to remember: No matter how sophisticated a computer system is, down at the bottom, it’s just a bunch of switches turning on and off. That’s all. When you turn on your tablet and the screen lights up, it’s just tens of millions of switches inside turning on and off telling the electronics what to do. All web sites are, at bottom, controlled by millions of switches flipping on and off. Yes, it sounds insane, but it works.
Back in the day, it was actual physical switches (eg vacuum tubes) that people literally switched on and off by hand in order to “program”. What’s happened over time is that the switches have gotten microscopic, and the systems that turn them on and off more sophisticated. But to repeat myself: The point of that inverted pyramid is just switches going on and off.
The next layer of the pyramid is “machine language”, which is a stream of ones (on) and zeros (off) sent directly to the switches. On top of that is a whole host of languages converting pseudo-english into those ones and zeros. On top of that is a host of languages that talk to those languages. And on top of that are the programs you run every day on your laptop, phone, car, or whatever. We are adding to those pyramid layers all the time, and at this point the pyramid is huge, and world-wide. And still balanced on a point.
Sometimes, a switch fails, or a program has an error and sends the wrong signal to a switch and it gets turned on when it should be off. Or a user (that’s you, kids!) sends a command into this giant edifice that the programmers haven’t anticipated, and it causes an unusual situation. When that happens, one of the blocks of the pyramid gets knocked out, and the thing goes off balance. And when the whole thing is balanced on a point, it can go off balance pretty badly, and your phone or laptop or whatever hangs.
Now, computer scientists have shored this delicate system up by surrounding the pyramid with lots of other pyramids so they can lean on each other and not be quite so precarious. And then pyramids get stacked on top of those pyramids, reinforcing the structure (and incidentally making a bad failure more catastrophic, but never mind that for now). We have devoted countless hours to creating backup systems that back up other backup systems that are behind even more backup systems. We don’t want your devices to fail. Especially if that “device” is, say, an airplane. And this all works pretty well in the main.
But every once in a while something unexpected happens in the system, and your device just hangs, or gives you the famed “blue screen of death”, or “bricks” (ie stops working entirely). The vast majority of these problems can be relieved by rebooting the device, although sometimes even that doesn’t work and you’re hosed.
Today my son brought me his iPhone because it had done hung. The weather in Austin is below freezing, we’re getting a lot of phone alerts, and somehow an alert had popped up on his phone at just the right microsecond—or maybe he happened to be holding down a button when it popped up—so that the message couldn’t be dismissed because the screen also wouldn’t accept input. Further, his phone couldn’t be rebooted in the ordinary way, with the ol’ hold-down-a-volume-button-and-the-power-button trick. So I had to go online and google for whatever backup plan C is to reboot iPhones. And it worked, because we have a bunch of backups now, despite this insane inverted pyramid structure.
(And if I may add another analogy: Computers are a lot like the sailing ships of yore in that they accumulate the data equivalent of barnacles on their hull. And just like a sailing ship had to be careened and scraped on a regular basis, this is why you need to reboot your phone or laptop every once in a while. To get those electronic barnacles off.)
So the bottom line is: Sometimes the system brain farts and needs to be reset. That might have PO’d that long-ago customer, but it’s simply the nature of the beast.