Unfortunately, simply saying something is so doesn’t make it so.
Client/server database systems, for example, are not necessarily distributed database systems. No matter what their vendors might say — and some vendors are trying to equate the two — the two are different.
Spotting a client/server database system is pretty easy. A server machine runs database server software. Some machines networked to that server run client software that uses the data-management services of the server database software. End of basic story.
Now the logic of some vendors goes: Put a few such database servers on a network, possibly a WAN, and you’ve distributed your database servers. Distributed database servers equals distributed database system, right?
Wrong. The mere presence of two or more geographically distributed database servers does not create a distributed database system. The term “distributed database-management system” (DDBMS) has been around for many years and has a reasonably precise meaning.
Logic is everything
First, a true DDBMS must be able to work with logically related data that is spread across multiple databases on multiple machines. The “logically related” part of this definition is vital.
If one database server on a network, for example, holds employee information and another database server on that network contains unrelated data on items for sale, you don’t have a DDBMS. You just have two database servers on the same network.
But, if the database server software lets you work with related data on both servers — say, employee data on one machine and insured dependent data on the other — then you have a DDBMS.
Slightly more technically, if you can run single database transactions that manipulate data on multiple machines, and if the database servers give those transactions the same consistency and integrity support as single-server transactions, you have a DDBMS.
Distributed database systems face some problems that plain database servers don’t. Two such problems are particularly tough.
The first is dealing with storing data on multiple servers.
In the example we just gave, all the employee information was on one machine, and all the dependent data was on the other. No piece of data appeared in more than one place. That’s a “partitioned” database, and it’s conceptually easy: If you want any one piece of data you have to go to only one place to get it.
Partitioned distributed databases, however, can be expensive to use precisely because you always have to send requests to where the data is located. That process can take a lot of time over a network, and it can also turn any server that holds frequently used data into a bottleneck.
To avoid those problems, you can duplicate parts of the database on different servers and create a distributed database system with “replicated” data. The cost here is that every time a data item changes anywhere, the database system must make sure every copy of that item changes.
The other especially difficult problem for distributed database systems is a related one: how to support multiserver updates and maintain database integrity.
To remove an employee entirely in our example, you must delete both the record of the employee and the records of all that employee’s dependents. If you delete only one or the other, the overall database becomes corrupt. (The right answer here is to support two-phase commit.)
Plain client/server database systems don’t have answers. If you buy the bogus claim that client/server database always equals distributed database, your problems won’t get solved.
No matter what the vendors say.