Thumbnail History of RDBMSs

It is difficult to get a man to understand something when his salary depends upon his not understanding it.
—Upton Sinclair

Yesterday

The Problem

Mankind has always sought mechanical ways to store information. When the computer arrived on the scene it was quickly put to the task. We'll skip ahead to 1970, by which time large corporations had moved their accounting operations to mainframe computers, usually from IBM. [1]

IMS

As the volume of computer records increased, so too did demand for a way to organize them. Every problem is an opportunity, and computer vendors, then as now, were only too glad to provide solutions. The king of the hill was IBM's Information Management System (IMS). You may be surprised to learn (I was) that IMS is still alive and kicking in 2010. IDC is even happy to explain why IMS is a good thing [pdf].

IMS is the granddaddy of what were later known — in contrast to Relational — as “hierarchical” database management systems. Grand as it may sound, a hierarchical database is just nested key-value pairs, hardly more technically sophisticated than Berkeley DB [pdf] or, truth be told, the Fast File System [pdf]. If you gave Berkeley DB 40 years and IBM's resources, you'd have IMS.

So what's the problem you ask? Ah, yes, the problem. Hierarchical databases leave a lot to be desired. To name two voids: They don't do very much to check the data for consistency, and they're notoriously inflexible.

The Promise

E. F. Codd

The shortcomings of pre-relational databases were well known at the time. Into the breach stepped E. F. Codd with his seminal paper, A Relational Model of Data for Large Shared Data Banks (1970) [pdf]. Codd provided something brand new: an algebra and a calculus for operating on data arranged in tables. It was the first and, to date, only mathematical system to describe database operations. Put another way: every other system of storing and retrieving data rests on the same thing most software does: intuition and testing. Codd brought mathematical rigor to databases.

Experts in the field understood the ramifications of what Codd had wrought. Listen to this recollection of Don Chamberlin, a contemporary:

… Codd had a bunch of queries that were fairly complicated queries and since I'd been studying CODASYL, I could imagine how those queries would have been represented in CODASYL by programs that were five pages long that would navigate through this labyrinth of pointers and stuff. Codd would sort of write them down as one-liners. These would be queries like, "Find the employees who earn more than their managers." [laughter] He just whacked them out and you could sort of read them, and they weren't complicated at all, and I said, "Wow." This was kind of a conversion experience for me, that I understood what the relational thing was about after that.

Would that we all had been there.

It would take IBM 10 years to convert Codd’s ideas into a product. Buy the mid-1980s, the RDBMS market had all the names you might recognize: DB2, Sybase, Oracle, and Ingres. All were based on Relational Theory. The revolution had arrived, and it was not televised. It would be written in SQL.

The Delivered

SQL

Underpinned though they were by mathematics, the first products were pinned down by the computing power of the day. It was felt they could not support OLTP, that their main use would be for decision support, and that many users wouldn't be programmers at all, but non-technical people. These end users wouldn't be trained in mathematics and certainly weren't going to be up on relational calculus. A more (yes, here it comes) user friendly way would be needed for mere mortals to query the system. In keeping with the fashion of the day, it was (yet another) “fourth generation language”: SQL.

That's right:   We came to be saddled with SQL because a real relational language, no matter how powerful and elegant, would be too hard. Thanks, Pops.

Perhaps they were right, though. Perhaps, were it not for SQL's approachability, so-called relational DBMSs would never have achieved their popularity. After all, it wasn't only end users who were unversed in relational calculus, and it's not as though every programmer, or even many programmers, can or are willing to learn some math in order to their jobs. Indeed, that remains the case today….

Further to the point, one vendor, Ingres, in fact did and does offer a non-SQL language, QUEL [pdf], based on relational calculus. If you're like most people you've never heard of it and until now never had reason to think there might be another, yea, better query language. Ingres quickly added SQL when it found it couldn't compete otherwise. (None other than Chris Date implemented Ingres's SQL interpreter, and he did it as a front end to QUEL!)

The market for database management software is no different from other IT markets: faddish, ignorant, profit-driven. The vendors' customers (in aggregate) aren't particularly well informed about relational theory. They face challenges and shop for solutions that vendors are only too happy to provide. If relational theory is hard to understand and explain — never mind hard to implement and/or seen by customers as an impediment — well, education isn't the vendor's business. The vendor swaps licenses for money, and the customer is happy.

Or, happyish.

Competitive Cacophony

Although Codd and later many others published papers on relational theory, the implementations were separate and disjoint, mutually incompatible.

Not only were their SQLs were different, surely, but so was (and is) everything else, in particular

We are accustomed to the idea that we can use whatever email software we please, use any web browser we please. But when it comes to database servers, proprietary lock-in was and remains the rule of the day.

Today

Stagnation and the Triumph of Ignorance

Apart from competitive machinations, SQL DBMSs are not much changed from 1985. SQL has improved a bit. Machines are certainly faster, which has made the DBMS easier to administer. There are more graphical tools, which are certainly popular and sometimes convenient. But someone cryogenically frozen in 1985 could learn to use a 2010 system in an afternoon. The same cannot be said of, say, the COBOL or C programmer faced with Java or C#.

This is not a good thing.

We have met the enemy, and he is us.
Pogo

The IT people of the 1970s can be forgiven for demanding a simplistic solution to a complex problem, for being unaware of the then-new relational theory. They probably should be saluted for recognizing the value of the technology even without understanding its foundation or full potential. That excuse is not available to us in 2010. The lack of progress in this area reflects our collective failure to learn the theory and seize on its promise.

Relational theory has made incremental progress, but you'll have a very difficult time finding support for it in commercial (or noncommercial) offerings. Or, indeed, among the database experts you know.

SQL's shortcomings have been known from the start [pdf]. Its warts become obvious to anyone who encounters the GROUP BY clause. But instead of demanding something better, something correct, what demand there is is for magic, for ways to “make the SQL go away”.

SQL should go away, but not by concealing it behind an Object Relational Mapper or some dynamic language construct that melds it into Python or Ruby or Lua. SQL should go away because it's not serving its intended purpose or audience. BASIC died; every other 4GL died; every other Pollyannic end-user query idea died. Yet SQL lives on, a zombie of the 1970s, when television was wireless and telephones were wired.

Think of it: Why do so many programmers dislike SQL? After all, SQL is a tool designed for the non-technical, now used primarily by programmers. Someone spends years mastering C++ or pretty much any other modern language you care to name, and there, smack in the middle of his code, like sand in the gears, is that blast from the past, SQL. Could it be programmers think it's impeding their work, that it's stupid?

Lots of programmers, though, wish the database was stupid, would just do as it's told instead of rejecting their queries or data. They don't regard relational theory as their ally; they regard it as an obstacle. (Strange they don't regard the compiler they same way.) Ignorant of relational theory, they suppose the DBMS is just another heap of old code and out-outmoded ideas, and yearn for a chance to do and use something shiny and new.

Worst are those ignorant of history, because they're doomed to repeat it. They're pouring IMS wine in new XML or NoSQL bottles. These nonsystems — sometimes called post-relational, the horror! — lack even the most rudimentary DBMS features: consistency checking and transactions, to name two. Yet there they go, charging at Cloud9Db or somesuch, cheered on by vendors happy to pocket their money. It's good they're young, because it will take 40 years to catch up to IMS.

Potholes on the High Road

Well, here's another nice mess you've gotten me into.
Ollie

But what of the programmer who does take the time to develop at least a passing knowledge of relational theory? His is not a happy lot, for ignorance is bliss.

He will come to know the awful state of most databases, with their dog's breakfast of naming conventions and endless normalization failures, only some of which are acknowledged mistakes. A badly coded application lies dormant and invisible (perhaps in a version control system), but a badly designed table is out in the open for all to see and, worse, for all to use. Such tables are common and frequently important. And, because important, cannot be changed.

He will find his boss and his peers do not share his enthusiasm. Yes, they'll say, it's a fine theory, in theory. But not in practice.

And he will find, finally, that they are right, even if they lie in a bed of their own making. The databases he meets are not so much designed as accumulated, forcing his queries to be much more complicated than strictly necessary. The server will fail him, too: His finely wrought queries will, often as not, defeat the query optimizer's poor power to reduce it to an efficient plan.

If he finds himself wishing for a new shiny instead, who could blame him?

All Hail the Status Quo

The only thing necessary for evil to flourish is for good men to do nothing.
—Edmund Burke

The RDBMS market stabilized (or, rather, stagnated) in more or less its current form well over 10 years ago, arguably 25 years ago. Why? One need not look for a complicated explanation; a simple one will do.

Is There No Hope?

Men switch masters willingly, thinking to make matters better.
—Machiavelli

There is always hope. How much depends on how many.

In the RDBMS market, as in any market, sellers seek to satisfy buyers. The RDBMS market is different because the buyers are ignorant and that's good for the vendors. The surest way to perturb the stability would be smarter buyers demanding something useful. The history of IT amply demonstrates the odds of that are long, but it definitely would work. No part of $20 billion a year will be simply refused.

What's new since the ossification of SQL is the Internet and the emergence of free software as a viable corporate technology stack software environment. Unlike in the 1970s, demand can create its own supply: free database projects could implement a better language, a better library (just one!), and agree on a single wire protocol. Any significant success would move the vendors. Look what baby-toy MySQL did: by creating a simple, hassle-free usable RDBMS, it forced every major vendor to offer scaled down version of its flagship product at no cost. Now imagine a world in which there are several interoperable servers sporting a real query language in addition to poor old SQL. The proprietary vendors would have to pay users to take their wares. Or, innovate. What do you think they'd do? I wager they'd follow the money.

So, yes, there's hope. If enough people demand a true RDBMS, if enough people build one, we could start a revolution. Not on television. Probably on YouTube, and definitely not in SQL.