Amazon Redshift Announced — The Cloud Data Warehouse Era Begins

Name: Amazon Redshift Announced — The Cloud Data Warehouse Era Begins
Start: 2012-11-28

On 28 November 2012, at the inaugural AWS re:Invent conference in Las Vegas, Andy Jassy—then AWS Senior Vice President and later Amazon's CEO—unveiled a new service: Amazon Redshift. It was a cloud-native data warehouse (DWH) promising petabyte-scale analytics at "one-tenth the cost of Teradata".

It was the moment the cloud arrived seriously in data analytics, the territory that on-premises hardware vendors had ruled for decades.

The DWH Industry in 2012 — An Oligopoly of Expensive Hardware

In the early 2010s the DWH market was dominated by a handful of hardware vendors. Teradata was the leader, followed by Netezza (acquired by IBM in 2010), Greenplum (acquired by EMC in 2010), Vertica (acquired by HP in 2011), and Oracle's Exadata.

What they shared technically was MPP (massively parallel processing) and columnar storage. MPP distributed query execution across many nodes. Columnar storage saved data column by column rather than row by row, cutting I/O for analytical queries (GROUP BY, aggregation, range scans) by an order of magnitude.

But all of this was heavy, on-premises hardware: hundreds of thousands of dollars per rack, months of installation, multiple DBAs to operate it. "When you need more analytics capacity, order another rack and wait six months" was the prevailing assumption.

The ParAccel Licence and the Birth of Redshift

AWS had been considering an internal DWH service since around 2010. As a base, it chose the MPP and columnar technology of ParAccel, a California DWH vendor. ParAccel had forked PostgreSQL 8.0 and added column storage and a parallel execution engine, in a product called PADB.

In 2011 AWS paid ParAccel for a licence and made a small equity investment to bring the technology in. The ParAccel code was integrated into Amazon's S3/EC2/VPC infrastructure and wrapped as a managed service—Redshift. Because it spoke PostgreSQL 8.0-compatible SQL, existing psql clients, ODBC/JDBC drivers, and BI tools (Tableau, Looker) could connect unchanged.

28 November 2012 — Announcement at re:Invent

re:Invent 2012, AWS's first ever conference, drew about 6,000 attendees. The Redshift preview announced there carried a price tag that shook the industry: US$1,000 per terabyte per year.

A 2012 Teradata or comparable DWH required hundreds of thousands of dollars of hardware to hold the same 1 TB, plus power, cooling, data-centre space, and DBA salaries. End to end, Redshift came in at roughly one-tenth of the cost of incumbents.

Plus the cloud-native advantages: spin a cluster up in a few clicks, scale node count up and down later, back up to S3, never buy or install hardware. General Availability (GA) followed in February 2013, opening Redshift to general customers.

The Birth of "Cloud DWH" as a Category

More than as an individual product, Redshift mattered because it created the "cloud DWH" category. Around the same time Google made BigQuery—Dremel's commercial form, serverless with storage and compute separated—generally available in 2010. Microsoft followed with Azure SQL Data Warehouse (later Synapse Analytics) in 2014.

Three-way cloud competition triggered seismic shifts in the on-premises DWH market. Teradata's stock fell roughly 70 per cent between 2014 and 2020. Netezza was discontinued by IBM in 2019; Greenplum and Vertica retreated into enterprise niches.

Redshift's Evolution from 2017

Early Redshift had storage and compute coupled together: to add capacity, you added nodes. This was the exact attack surface that Snowflake (GA 2014) would target with its storage-compute separation.

AWS responded with Redshift Spectrum (2017), which queries S3 data directly; the RA3 node family (2019) with Managed Storage, effectively separating storage and compute; and Redshift Serverless (2021). It has absorbed Snowflake's design ideas while leaning on its strength: tight integration with the rest of the AWS ecosystem.

As of 2024 the cloud DWH market is shared between Redshift, Snowflake, and BigQuery as the three majors. Teradata pursues VantageCloud to push into the cloud as well, but the centre of gravity has fully moved off premises.

What 28 November 2012 Means

The Redshift announcement was the inflection point from "buying a DWH" to "renting a DWH by the hour". An industry that had operated for decades on the "purchase dedicated hardware, wait six months, hire DBAs to run it" model shifted to one where you "register a credit card and start running queries in ten minutes".

Years later, on the same cloud-native foundation, Snowflake would catch up to Redshift with an architecture that fully separated storage and compute. The evolution of the cloud DWH began at the door Redshift opened.

Amazon Redshift Announced — The Cloud Data Warehouse Era Begins

Metadata

Amazon Redshift Announced — The Cloud Data Warehouse Era Begins

The DWH Industry in 2012 — An Oligopoly of Expensive Hardware

The ParAccel Licence and the Birth of Redshift

28 November 2012 — Announcement at re:Invent

The Birth of "Cloud DWH" as a Category

Redshift's Evolution from 2017

What 28 November 2012 Means

Sources