AWS Layers: How to include dependencies with your AWS Lambda Python Function

I wanted to write a quick how to blog in reference to show someone how to include dependencies with their python AWS Serverless function.

The Use Case

You have been assigned a project and you need to create a Lambda Serverless function in python that needs to execute a stored procedure or a SQL statement against a RDS Postgres Database. You need this function to be ran on a time schedule and it has to interact with the Postgres database.


What is AWS Layers

AWS has implemented some nice functionality called layers for us to easily include dependencies that our python script needs. Layers allow you to configure your Lambda function to pull in additional code and content in the form of layers. A layer is a ZIP archive that contains libraries, a custom runtime, or other dependencies. With layers, you can use libraries in your function without needing to include them in your deployment package. This allows your deployment package to be smaller.

Prerequisite: You have gone through the steps of writing and saving your Lambda function written in python you now want to add the psycopg2 library for your code to use.

Step one: Download the psycopg2 from https://github.com/jkehler/awslambda-psycopg2

Step two: Create a directory name python and put the psycopg2 folder inside the newly created python folder

Step three: Zip up the python directory that you created in step two

Step four: Save your function and then go to the Lambda home screen and click on “Layers”

Step five: Click the “Create Layer” button located at the top right part of the AWS console

Step six: Layer Configuration. Give your Layer a unique name. Enter a description of your layer. Click the “Upload” button and browse to your python zip file and select. Choose the Runtime version that you want. Click the “Create” button.

Step seven: Now that your layer is created time to setup in your function. Go back to the Lambda home screen and select functions. Choose the function that you created and has the dependency for psycopg2.

Step eight: Select the Layers box right under the name of your function

Step nine: Select the “Add Layer” button

Step ten: Select the Layer your created from the “layer” dropdown. Select the version from the “Version” dropdown. Click the “Add” Button.

You have now added a layer to your Lambda python function and you can go ahead and test it. Hope this helps.

Advantages of a data warehouse in the AWS cloud

There are several advantages of why I believe you should go with a data warehouse living in the cloud and not on premise. I know everyone has heard all the news and excitement around what AWS has been doing around their data warehouse offerings. AWS has been one of the leaders in the Cloud industry for a while and the way they are continually updating their existing products as well as releasing new cloud products is awesome. Lets now go over the advantages that I see a cloud warehouse has over on-premise.

Speed of implementation: With a cloud solution your server can be up and running within fifteen minutes. For the on-premise you will have to order a server, rack the server, install the software, apply patches, put the server on the network, and all that can take several weeks to months from ordering to your warehouse is live and ready.

Flexibility and scalability: Let’s say in a year or two that your company no longer wants a data warehouse, with the cloud solution you just terminate it and now you are no longer paying for those resources. With the on-premise you have to figure out on how to recoup the capital expense from those purchased server and/or servers. Scalability is what the cloud is made for. You want another server you spin it up. With the on-premise you have to plan accordingly and go through the entire process again order a server, racking the server, install the software, apply patches, putting the server on the network.

Ongoing cost: Hardware usually has a three year life span and then you have to replace the server. Along with replacing the server you will have to plan the migration of the warehouse and then migrate. All of that adds up the cost as well. In the cloud you are not responsible for the hardware so this is all handled without your involvement you can focus on the warehouse schema and the data which is the most important pieces of the business intelligence solution.

Security: All cloud based providers are constantly checking the security of their infrastructure with teams of people that try to compromise the system. To go long with that, infrastructure physical security is very tight around their data centers usually taking two to three methods of identification before being allowed to enter the facility.

I think it is safe to say that I am a little biased when it comes to cloud solutions and what they can offer to businesses. That is not to say that I do not rule out on-premise infrastructures. There are requirements and situations that the business really needs to look hard at and make the decision that on-premise is the best way to go for them.

 

Data warehouse in the Cloud versus On-premise

There are a lot of misconceived notions that cloud solutions have speed constraints leading to latency and have huge security vulnerabilities. These thoughts and some others have many companies move the cloud solution option to the bottom of the pile and that is simple not a good option for the business. To address the first notion that cloud solutions have speed constraints leading to latency, I think is misleading when people state that. If your warehouse and your visualization tool are all in the same cloud infrastructure then I do not think latency will be an issue at all. It is when your warehouse is in the cloud and visualization tool is hosted on premise is when you can see a little latency if any at all. Both AWS and Azure offer tools to support high speed on-premise to cloud technologies. AWS calls its tool Direct Connect and Azure call theirs ExpressRoute. Direct Connect and ExpressRoute  are essentially private cloud connections that offer faster speeds, more reliability, lower latencies, and higher security than standard Internet connections. These tools essentially connect your network (private) directly to your cloud provider without crossing the public internet.  The second notion in reference to security vulnerabilities. Lets address physical security first. AWS and Microsoft have invested a lot of money into the physical security around protecting their infrastructure and it greatly exceeds any normal IT environment. Now for data security that is something that you have complete control of. Yes it is a little more complex than your traditional on-premise environment but if done properly your data is just as secure. Use all the security control tools that are available deterrent, preventative, detective and corrective and along with creating policies and using best practices you will put in place some outstanding data security measures that will allow everyone in the business to sleep good at night. There are many books out that review this topic in great detail, I do suggest investing in one or two and further educate yourself in this area.

In my opinion cloud based solutions, especially for small to midsize companies is an excellent option and can provide  enterprise like excellent performance and availability. I also believe that most companies can really reap the benefits from the cloud solution which typically can lower cost, scale easier, and be up and running very quickly. Two cloud warehouse solutions you should definitely look at are AWS Redshift and Google Big Query. These are the two leaders in the industry and they have proven themselves as reliable and scalable. If you do review them please reach out to me and let me know your thoughts and which one you decided to go with and why.

How much does it cost?

This question is usually the first asked question by the executive team. If your executive team does not ask this within the first five minutes of your conversation then you have a team that is investing in doing the right thing by looking at what technology is best for the business and not worried about cost. There will be a difference in cost on what solution you do pick whether it is the cloud solution or an on-prem. With an on-prem solution you will have bigger capital expenses as well as the cost of additional software licenses depending what database you choose for your data warehouse and what operating system that database needs to run on. These expenses include but are not limited to hardware (server(s), network switches, ups, etc), space, and power consumption. Do not forgot about the human cost on maintenance of the servers and the database. The cloud solution will already have server, power and space included in the price. The server patches as well as patches to the database will be taken care for you with the cloud solution so no worries about additional man hours around those areas as a monthly resource cost.

There are plenty of articles and tools out there they go over on how to calculate the cost of on-prem and cloud solutions and they are very informative. One of the tools I use often is the AWS total cost of ownership tool https://aws.amazon.com/tco-calculator/. This tool will assist you in seeing how AWS can reduce your capital expenses in a project such as implementing a business intelligence solution as well provide you some very detailed reports that you can utilize in your presentation to management. The one thing I will say if you are only implementing a one server solution the cloud solution might cost a little more but will give you several advantages. In my next blog I will talk about some of those advantages in detail.