June 1970T1

Codd's Relational Model Paper — Giving Databases a Mathematical Foundation

E. F. Codd of IBM's San Jose Research Lab published 'A Relational Model of Data for Large Shared Data Banks' in CACM. Grounded in set theory and first-order predicate logic, the paper proposed the relational algebra and established 'data independence'—the principle that applications need not depend on the physical layout of stored data. As a fundamental challenge to the then-dominant hierarchical and network DBMS (IMS, IDS), it became the theoretical origin of every RDBMS that followed, from Oracle and DB2 through SQL Server, MySQL, and PostgreSQL. Codd received the ACM Turing Award in 1981.

Aerial view of IBM Research Almaden (formerly IBM San Jose Research Laboratory)
SourceDicklyon (Wikimedia Commons) · CC BY-SA 4.0 · View on Commons

Metadata

Date
June 1970
Decade
1970s
Tier
T1
Sources
04
Connections
00

Codd's Relational Model Paper — Giving Databases a Mathematical Foundation

In June 1970, Edgar Frank Codd, a researcher at IBM's San Jose Research Laboratory (today IBM Research Almaden), published "A Relational Model of Data for Large Shared Data Banks" in volume 13, issue 6 of Communications of the ACM. It went on to become one of the most cited papers in computer science.

That month, the new technical genre called "the database" was given a mathematical foundation.

Databases in 1970 — A World of Trees and Pointers

Before the paper, two families of systems handled bulk data. One was the hierarchical DBMS, exemplified by IBM's IMS, which stored records as a tree of parent-child relationships. The other was the network DBMS descended from Charles Bachman's IDS and codified by CODASYL, in which records were joined by explicit pointers into a graph.

Both shared a problem. To write a query, the application had to know how the data was physically stored—which tree node, which pointer to follow. Change the storage layout and you rewrote the application. Codd called this "binding applications to the internal representation".

The Paper's Claim — Sets and Logic Are Enough

Codd's paper runs to eleven pages. Its argument collapses into two claims.

First claim: all data can be represented as 'relations'—mathematical sets. A relation is defined by its attributes (columns), with rows (tuples) belonging to that relation. Not files, not trees, not pointer webs, but collections of sets. That is the abstraction proposed.

Second claim: every meaningful operation on relations can be expressed using set operations and first-order predicate logic. Codd formalised this as the relational algebra—projection, selection, join, union, difference, Cartesian product, division—and gave the equivalent declarative notation, the relational calculus. SQL would later draw its syntax mainly from the calculus branch.

The natural consequence of these two claims is data independence. The application writes only "what data do I want" in relational algebra or calculus; the DBMS decides how to translate that into physical access paths. Change the storage layout and the queries do not move.

Foreshadowing ACID

Codd's paper does not use the term ACID (Atomicity, Consistency, Isolation, Durability). Yet his framing—that data integrity could be expressed by foreign-key constraints and functional dependencies—was the ground IBM's System R team (Jim Gray and others) would stand on when they designed ACID transactions in the late 1970s.

System R was the first serious implementation of a relational DBMS. Convinced by Codd's paper, an IBM research group began construction in 1974. From it emerged the SEQUEL language (later renamed SQL), two-phase locking, the recovery log, the query optimiser—broadly, the building blocks of every modern RDBMS trace their lineage to System R.

Why IBM Was Slow

There is an irony the history books record. IBM invented the relational model, but IBM was not the first to commercialise it.

IBM's main business protected IMS, the hierarchical DBMS, and senior management was reluctant to ship a competing product line. The lead in the commercial market therefore went to Software Development Laboratories, which shipped Oracle V2 in 1979 and later renamed itself Oracle Corporation. IBM did not release SQL/DS and DB2 until 1981-1983.

Codd himself spent the 1970s evangelising the relational model inside IBM. In 1985 he published the famous "Codd's 12 Rules", openly accusing many products that called themselves "relational" of not being so in any rigorous sense.

The Turing Award and the Legacy

In 1981 Codd received the ACM Turing Award for his work on the relational model.

More than half a century later, the dominant systems in the world database market are all from the relational lineage—Oracle, Microsoft SQL Server, IBM Db2, MySQL, PostgreSQL, and on through the cloud-warehouse era of Snowflake and BigQuery. SQL, a language descended from the relational calculus, remains the lingua franca. Even the "NoSQL" movement of the 2000s was framed as a question of how to deal with relational concepts, not how to escape them.

Codd's eleven pages are a rare case in software engineering of mathematical rigour shaping an entire industry. By defining not how data is stored but what data is, he set a frame the field has worked inside for fifty years.

Sources

  1. PrimaryEdgar F. Codd — ACM A.M. Turing Award

    Accessed 2026-05-25

  2. SecondaryEdgar F. Codd — Wikipedia

    Accessed 2026-05-25

  3. SecondaryRelational model — Wikipedia

    Accessed 2026-05-25

Share