Redshift versus Snowflake

We are going to go over a couple of major areas to inform you whether Redshift or Snowflake is a better data warehouse for your business is based on three categories security, performance, pricing needs.

How do these two cloud data warehouse compare to one another.
Redshift:

  • Deep discounts when your commitment is for a longer term
  • More unified offer package
  • Security and compliance enforced in a thorough manner for all users
  • Machine learning engine can be easily attached to
  • Little more hands-on maintenance
  • Snowflake:

  • Pay separately for compute and storage
  • More robust support for JSON-based functions
  • Tier-based packages
  • Security and compliance options will be different by tier
  • Unique architecture designed to scale on the web
  • More automated database maintenance features
  • Redshift is a solid cost-efficient solution for enterprise-level implementations. Snowflake is a good warehouse to start and grow with. If your business has less experience resources then Snowflake might be a good start for your business, where as if you have experience resources in this are Redshift would be a great warehouse for your company.

    Security: Choose your warehouse wisely
    While Redshift addresses security and compliance in a very thorough manner, Snowflake takes a more subtlety approach.

    Redshift’s encryption from start to finish can be tailored to fit anyone’s security requirements. Redshift can also be isolated within the network by being placed in a virtual private could (VPC) and then linked to an existing infrastructure (VPN). Another nice feature that can help your businesses to meet their compliance requirements with Auditing is integrating Redshift with AWS CloudTrail. The wealth of logs and analytics that you can receive will help you in the long run as far as debugging issues and shed light on performance issues.

    Snowflake handles end-to-end encryption automatically encrypting the data in transport and at rest. You are able to isolate your Snowflake with options VPC/VPN. A big difference in security and compliance from Snowflake to Redshift is that options for this grows stronger on which edition of Snowflake you opt for. This is where you have to carefully consider with edition of Snowflake will cover your needs.

    Performance: New Redshift features compete with Snowflake

    Snowflake and Redshift both utilize columnar storage (this is where data is stored by columns and not by rows) and parallel processing (this is computing that separate parts of the overall tasks are broken up) for simultaneous processing, which will save your analytical team a lot of time when processing very large jobs.

    Snowflake articulates that its performance is driven by its architecture that supports structured and semi-structured data. It places the storage, compute and cloud services separately to optimize their independent performance.

    Both Redshift and Snowflake offer concurrency-scaling (adds and removes computational capacity to handle ever-changing demand) features and machine learning to really add value to their warehouses. Both warehouses also offer free trails to their products to help companies experience their solutions value first hand.

    Pricing: Don’t stop at the sticker price but also consider long-term benefits
    Both warehouses offer on-demand pricing, but bundle associated features differently to really separate themselves from one or the other.

    The differences

  • Snowflake separates compute usage from storage in their pricing structures.
  • AWS Redshift offers users a dedicated daily amount of concurrency scaling and once usage is exceeded Redshift charges by the second.
  • Snowflake automatically includes concurrency scaling.
  • Redshift gloats the potential for deep discounts over the long term if you commit to a one or three year contract. Redshift does offer a option to pay and hourly rate.
  • Snowflake offers five options with additional features tied to each increasing the level of price.
  • When you are trying to make your final decision on which of the two warehouse to go with make sure you look at what you need specifically data volume, processing power and analytically requirements. Look for the right warehouse that will improve your accuracy and speed of data-driven decisions. Also you need to look at the resources that you have inside your business to ensure that you will be able to support the warehouse that you choose.

    Which warehouse makes sense for your business?

    Below are some additional comparisons to help guide you to picking the right solution.

  • Security: Redshift includes a deep bench of encryption solutions, but Snowflake provides security and compliance features oriented to which of the five options you choose.
  • Bundled features: Redshift bundles compute and storage to provide the immediate solution that can scale to an enterprise level data warehouse. Snowflake provides a business the flexibility to purchase only the features they need while giving the capability to scale later.
  • JSON: Both warehouse store JSON but Snowflake JSON support is a little bit stronger then Redshift. When you load JSON into Redshift you can use their build in functions but there are limitations where as Snowflake you can store and query JSON natively.
  • AWS Layers: How to include dependencies with your AWS Lambda Python Function

    I wanted to write a quick how to blog in reference to show someone how to include dependencies with their python AWS Serverless function.

    The Use Case

    You have been assigned a project and you need to create a Lambda Serverless function in python that needs to execute a stored procedure or a SQL statement against a RDS Postgres Database. You need this function to be ran on a time schedule and it has to interact with the Postgres database.


    What is AWS Layers

    AWS has implemented some nice functionality called layers for us to easily include dependencies that our python script needs. Layers allow you to configure your Lambda function to pull in additional code and content in the form of layers. A layer is a ZIP archive that contains libraries, a custom runtime, or other dependencies. With layers, you can use libraries in your function without needing to include them in your deployment package. This allows your deployment package to be smaller.

    Prerequisite: You have gone through the steps of writing and saving your Lambda function written in python you now want to add the psycopg2 library for your code to use.

    Step one: Download the psycopg2 from https://github.com/jkehler/awslambda-psycopg2

    Step two: Create a directory name python and put the psycopg2 folder inside the newly created python folder

    Step three: Zip up the python directory that you created in step two

    Step four: Save your function and then go to the Lambda home screen and click on “Layers”

    Step five: Click the “Create Layer” button located at the top right part of the AWS console

    Step six: Layer Configuration. Give your Layer a unique name. Enter a description of your layer. Click the “Upload” button and browse to your python zip file and select. Choose the Runtime version that you want. Click the “Create” button.

    Step seven: Now that your layer is created time to setup in your function. Go back to the Lambda home screen and select functions. Choose the function that you created and has the dependency for psycopg2.

    Step eight: Select the Layers box right under the name of your function

    Step nine: Select the “Add Layer” button

    Step ten: Select the Layer your created from the “layer” dropdown. Select the version from the “Version” dropdown. Click the “Add” Button.

    You have now added a layer to your Lambda python function and you can go ahead and test it. Hope this helps.