Posts

AWS Lambda Integration With EventBridge

Image
                                                                                  In our previous blog, we explored the concept of Lambda versioning. In real-world scenarios, Lambda functions are typically triggered either on a schedule or in response to specific events. In this post, we’ll walk through how to invoke a Lambda function using both scheduled triggers and event-driven mechanisms. This is our goal. Our lambda scans for any RUNNING, PENDING instances of type "T3.SMALL". If there are any instances of that type, it triggers an email. Involved services: 1) Lambda - Python code to scan for  RUNNING, PENDING instances of type "T3.SMALL" 2) Event Bridge - Scheduler and Event Based. 3) SNS - Notification Service. Here is the simple...

AWS Lambda - Traffic Shift

Image
  We all know what is AWS Lambda which is a serverless architecture. In this blog, we will discuss on what is lambda alias and how it can be used shift traffic between 2 version of AWS lambda. What is lambda versioning? AWS Lambda versions are immutable snapshots of your function’s code and configuration at a specific point in time. An immutable snapshot is a frozen, unchangeable copy of something at a specific point in time. Immutable = cannot be changed Snapshot = point-in-time copy Lets start by creating a simple lambda function with nodejs as runtime environment. I updated the code as below: Invoking the lambda function. Let's publish this version of lambda function as "Version-1". Now, we have our function looks like: demo-function -> Version-1 Let's update the demo-function with the below content. Publishing this version as "Version-2". Now, our lambda "demo-function" has 2 versions. demo-function -> Version-1               ...

Deployment Strategies In AWS ASG - Terminate and Launch

Image
    Terminate and Launch: Terminate the existing instance and launch a new instance with updated LT. In this strategy we will change the minimum number of instances to be running during the time of deployment.  I have 5 EC2 instances under the ASG which are running on version LT1. Setting minimum 10% -> Ensure 1 machine is actively taking traffic and others are updated. Setting minimum 50% -> Ensure 3 machines are active and 2 are updated. Setting minimum 90% -> Ensure 4 machines are active and 1 is updated. Increasing the minimum will increase the deployment time. Please find the below table for better understanding. Instances # Min Healthy Instances Minimum Must Machines Instances # Patched at a time 10 20% 2 8 10 30% 3 7 10 40% 4 6 10 50% 5 5 10 ...

Deployment Strategies In AWS ASG - Launch before Terminating

Image
  We all know what is AWS ASG ( Auto Scaling Group ) which scales EC2 instances based on CloudWatch metrics . Backbone of ASG is Launch Template ( LT ), which acts like a blue print for creating EC2 instances. Whenever, there is an update to the LT, existing instances must be updated safely without having much downtime or with near zero downtime . ASG offers various methods of instance refresh .  1) Launch before Terminating . 2) Terminate and Launch . 3) Custom Behavior . Launch and Terminate: As the name says, it creates new instance with latest LT before terminating existing instances.   Launch new instances and wait for them to be ready before terminating others. This allows you to go above your desired capacity by a given percentage and may temporarily increase costs. Let's say the ASG has desired capacity of 5 EC2 instance. "Launch before Terminating" strategy ensures it always has 5 EC2 instances exist. With "Launch before Terminating": At any given t...

Apple Interview QnA - Part II

Image
                                                      A pod in Kubernetes cannot reach an external API, but curl works fine from the node. What is your debugging flow? This situation clearly shows the issue is in the pod layer, because the endpoint is accessible from node where the pod is running. Node and Pods don’t share the same network in real time. So, I would start with the below checks:   Check if the endpoint is resolving from the pod. This is to eliminate if it is a network issue or DNS resolve issue. If DNS fails, check “CORE DNS” pods which is usually created on all the worker nodes. CORE DNS pods are usually run a replicaset. It is worth to check the pod health and resource consumption. Let’s say the DNS work fine and we are getting timeout while connecting the external API. This could be due to the network policy (EGREES) configured...

Apple Interview QnA - Part I

Image
  A server’s CPU is pegged at 100% but top shows no process consuming that much. How do you debug? When the CPU is at 100% then there is no point in running top. Because “top” command needs CPU and Memory to fetch the statistics. We can start with running # sar or # vmstat command to find the CPU performance. CPU stats are divided into %user, %system, %idle and %wait. If the CPU% is high on the user end which refers to an application which is consuming more space. Then, I would start filtering out the process which is consuming more space using # ps -ef <followed_by_cpu_flags> to get which process is consuming more CPU. If the process is owned by non-root application, we can restart the application. If the process is owned by root, then we can dig into the logs to see what is causing the issue. Ideal solution would be to reboot the server. If the CPU% is high on wait end which refers CPU waiting for an event I/O to happen. Check mpstat -P ALL 1 to see per-core usage and %ir...