The Database Emperor Has No Clothes:
Hadoop's Inherent Advantages Over Relational Database Management Systems for Data Warehouse & Business Intelligence in the 'Big Data' Era
Relational Database Management Systems (RDBMS) were specified by IBM's E.F. Codd in 1970, and first commercialized by Oracle Corporation (then Relational Software, Inc.) in 1979. Since that time, the vast majority of databases have been built using an RDBMS - either proprietary (Oracle, SQL Server, DB2, etc.) or open source (MySQL, PostgreSQL, etc.). This was entirely appropriate for transactional systems that dealt with structured data and benefitted from that data being normalized. In the late 1980s, we began building Decision Support Systems (DSS) - also referred to as Business Intelligence, Warehousing and Analytics (BIWA). We used RDBMS for these too, because it was the de facto standard and essentially the only choice. To help with performance, we denormalized the data to eliminate the need for most table joins, which are costly from a resource and a time perspective. We accepted this adoption (some would say "misuse") of the relational model because there were no other options - until recently.
Relational databases are even less suitable for handling so called "Big Data". Transactional systems were designed for just that - transactions; a point in time when a purchase occurred or an event happened. Big Data is largely a result of the electronic record we now have about the activity that precedes and follows that purchase or event. This data includes the path we took to a purchase - either physical (surveillance video, location service or GPS device) or virtual (server log files / "click-stream"). It also includes data on where we may have veered away from a purchase (product review article or comment, shopping cart removal or abandonment, jump to a competitor's site, etc.). And it certainly includes data about what we say or do as a result of a purchase or event (tweets, likes, blogs, yelps, reviews, customer service calls, returns, etc.). This data dwarfs transactional data in terms of volume and, more importantly, it does not usually lend itself to the structure of tables and fields.
Back to top
About the Speaker
David Teplow began using Oracle with version 2 in 1981, and has worked as a consultant for Database Technologies (1986-1999) and Integra Technology Consulting (2000-present). He can be reached at: DTeplow@IntegraTC.com
Back to top