TransWikia.com

AWS Lambda query to Redshift once a day

Stack Overflow Asked by xbeta on December 23, 2021

I am fairly new to AWS ecosystem, especially with the data side.

I have a project that requires me to run a query against a table in Redshift every 24hrs automatically, and perhaps remove a few columns in that query results and use RESTful API to hit some endpoints at a 3rd-party site for further checking.

I have a few questions on this.

  1. Is it a good usage pattern to use AWS Lambda (Python) and
    Redshift for such task?
  2. Should I choose Java vs Python vs
    NodeJS for AWS Lambda? Which one has a better support for querying
    Redshift?
  3. Both Lambda and Redshift would be in the same VPC, and
    using the same private subnets for egress NAT gateway, is this
    secured setup?
  4. Any sample code to share on this setup?
  5. Does AWS Lambda has a regular scheduler to trigger every 24hrs? Or is it simply based on events?
  6. Since application database is in DynamoDB, is it more efficient and easier to setup for AWS Lambda to query DynamoDB for similar data instead?

Thanks,
Sam.

2 Answers

Normally you will find many AWS tools are able to solve the same problem. The right choice depends of your priorities. What are you looking for lowest cost? efficiency? convinience?

I answer your questions below:

Is it a good usage pattern to use AWS Lambda (Python) and Redshift for such task? Yes it's ok. Redshift is normally a very expensive service, are you sure you need Redshift here?

Should I choose Java vs Python vs NodeJS for AWS Lambda? Which one has a better support for querying Redshift?

Java will require an event bridge call every 5 minutes or so if you want to avoid cold starts. Apart from that it really is up to you.

Both Lambda and Redshift would be in the same VPC, and using the same private subnets for egress NAT gateway, is this secured setup? It's ok but again NAT Gateways are expensive. Depending on the problem you are trying to solve there might be some work around.

Any sample code to share on this setup? https://aws.amazon.com/blogs/big-data/building-an-event-driven-application-with-aws-lambda-and-the-amazon-redshift-data-api/

Does AWS Lambda has a regular scheduler to trigger every 24hrs? Or is it simply based on events? Yes you can use cron or similar to program the lambda trigger using event bridge.

Since application database is in DynamoDB, is it more efficient and easier to setup for AWS Lambda to query DynamoDB for similar data instead? I'm a bit confused with this last question, but normally is very easy to query Dynamo fom lambda.

Edit: Typo

Answered by Jose Mendez on December 23, 2021

I'll try to answer your question with the best intentions:

  1. Yes, there is no argument to not do this.
  2. It solely depends on your preference. All languages offer support for your use case.
  3. This is perfectly fine. As you're managing further access rights with IAM, you just have to look that your egress traffic from your lambda function is properly monitored.
  4. There is a lot out there. Just have a look.
  5. You can set up a CloudWatch rule with a CRON string that will invoke your function as you need it. Also, you can set up a lot of other triggers for your functions like DynamoDB streams, CloudWatch log events, ... there are endless possibilities.
  6. If you just want to do a regularly query to gather some data, there's no difference where your data is actually stored.

Answered by tpschmidt on December 23, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP