January 15, 2018

Observation: Amazon Redshift & Aurora

Many organizations are thinking about Amazon's Redshift & Aurora Databases. Followings are some observations and thoughts for anyone who plans to use these database technologies  -



What is Amazon Redshift?
  • Redshift is Amazon's Analytic Database
  • Redshift started as a 2005 fork of PostgreSQL (Version 8.0.2) by ParAccel
  • ParAccel was a small (15 Engineer) company that failed in the market and has been shut down
  • Amazon bought the ParAccel technology in 2011
  • Redshift is NOT Open Source
    • Amazon has contributed NOTHING back to PostgreSQL community

Amazon Redshift: Observations - 
  • No sophisticated query optimization or materialized views 
  • No integrated data mining or OLAP 
  • No table partitioning & constraints 
  • No cross-region disaster recovery 
  • Downtime required if node fails or cluster grows/shrinks 
  • Limited multiuser concurrency 
  • No rich data types (spatial, xml, comprehensive JSON) 
  • No table indexing, so no real-time tactical queries 
  • Table level locking – Similar to Oracle version 5 from 30 years ago 

What is Amazon Aurora? 
  • Aurora is Amazon's Relational OLTP Database
  • Very new, released less than 3 years ago
  • Started as a fork of MySQL 5.6
  • Amazon modified the MySQL engine to integrate with their cloud storage and networking
  • Amazon Aurora is NOT Open Source
  • Amazon has contributed NOTHING back to MySQL community

Amazon Aurora: Observations - 
  • Cannot scale to large data size - no parallel SQL 
  • Aurora "Clusters" are really replicas that are not consistent with the primary and are limited to simple OLTP reads, no updates. 
    • No load balancing across "Clusters" 
  • Cannot remote replicate encrypted database. 
  • Single threaded remote replication. 
  • Poor support for complex SQL and analytics. 
  • Poor stored procedures. 
  • Data loss from restore or cross-region failover. 
  • Scale-up of database CPU's requires downtime. 

Summary - 
  • NOT Compatible: Redshift & Aurora are not compatible with existing enterprise database applications. Which means: rewrite/re-architect the whole application stack. 
  • Less Functionality: They are missing decades of functionality. 
  • Move to Amazon Databases requires to - 
    • Throwing away decades of on-premises investments 
    • Plus expensive application rewrites and compromises. 
  • Lock-In: Cannot move to different cloud or bring back to on-premise once applications are deployed in Amazon's Redshift/Aurora. 

It may be best to use the original PostgreSQL & MySQL since they can be move in and out of any Cloud (No Lock-In), free to use and provides lots of functionalities.

2 comments:

  1. I am glad to read this post, it's a good one. I am always looking for quality posts and articles and this is what I found here, I hope you will be adding more in the future. Thanks for sharing. redshift etl

    ReplyDelete