AWS Step Functions is a serverless orchestration service that lets you easily coordinate multiple Lambda functions into flexible workflows that are easy to debug and easy to change.
Here at Ibmi Media, as part of our Server Management Services, we regularly help our Customers to perform AWS related tasks.
In this context, we shall look into how to use AWS Step Functions to handle workflow runtime errors.
It is a serverless orchestration service that easily coordinates multiple Lambda functions into flexible workflows that are easy to debug and easy to change.
Similarly, AWS Lambda is a compute service that runs code without provisioning or managing servers.
Lambda functions can fail in three cases:
i. An unhandled exception raise
ii. Timeout
iii. Out of memory
However, to avoid this, and to reduce the amount of error handling code we write, we can use AWS Step Functions. It creates a serverless workflow that supports function error handling.
In order to Handle Errors in Serverless Applications with AWS, follow the steps provided below.
Step 1. Create a Lambda Function to Mock an API
In this step, we will create a Lambda function that will mock a few basic API interactions.
i. Open the 'AWS Management Console', Enter user name and password. Next, search and select 'Lambda' to open the service console.
ii. Then choose, 'Create a function'.
iii. Leave 'Author from scratch' selected. Next, configure the Lambda function as follows:
a. For Name, type MockAPIFunction.
b. For Runtime, choose Python 3.6.
c. For Role, select Create custom role.
A new IAM window will open. Leave the 'Role name' as 'lambda_basic_execution' and click 'Allow'. We will automatically return back to the Lambda console.
Click the 'Create function' option.
iv. On the 'MockAPIFunction' screen, scroll down to the 'Function code' section. In the code window, replace all of the code with the following, then choose 'Save'.
class TooManyRequestsException(Exception): pass
class ServerUnavailableException(Exception): pass
class UnknownException(Exception): pass
def lambda_handler(event, context):
statuscode = event[“statuscode”]
if statuscode == “429”:
raise TooManyRequestsException(‘429 Too Many Requests’)
elif statuscode == “503”:
raise ServerUnavailableException(‘503 Server Unavailable’)
elif statuscode == “200”:
return ‘200 OK’
else:
raise UnknownException(‘Unknown error’)
v. Once done, scroll to the top of the window and note its 'Amazon Resource Name' (ARN) in the upper-right corner of the page.
Amazon Resource Names (ARNs) uniquely identify AWS resources and help track and use AWS items and policies across AWS services and API calls. We require an ARN to reference a specific resource from Step Functions.
Step 2. Create an AWS Identity and Access Management (IAM) Role
AWS Step Functions can execute code and access other AWS resources. To maintain security, we must grant Step Functions access to these resources using AWS Identity and Access Management (IAM).
i. In another browser window, navigate to the 'AWS Management Console', search for 'IAM'. Click IAM to open the service console.
ii. Click 'Roles', then choose 'Create Role'.
iii. Then select 'type of trusted entity page', select 'Step Functions' from the list, and then choose 'Next: Permissions'.
iv. On the 'Attach permissions policy' page, choose 'Next: Review'.
v. On the 'Review' page, type 'step_functions_basic_execution' for Role name and click 'Create role'.
vi. The new IAM role is created and appears in the list beneath the IAM role for the Lambda function.
Step 3. Create a Step Functions State Machine
In this step, we will use the Step Functions console to create a state machine that uses a Task state with a Retry and Catch field to handle the various API response codes.
We will use a Task state to invoke the mock API Lambda function, which will return the API status code we provide as input into the state machine.
i. Open the AWS Step Functions console. On the 'Create a state machine' page, select 'Author from scratch'. In the 'Details' section, name the state machine 'MyAPIStateMachine', and then select 'I will use an existing role'.
ii. Next, we will design a state machine that will take different actions depending on the response from the mock API. If the API cannot reach, the workflow will try again.
Replace the contents of the 'State machine definition' section with the following code:
{
“Comment”: “An example of using retry and catch to handle API responses”,
“StartAt”: “Call API”,
“States”: {
“Call API”: {
“Type”: “Task”,
“Resource”: “arn:aws:lambda:REGION:ACCOUNT_ID:function:FUNCTION_NAME”,
“Next” : “OK”,
“Comment”: “Catch a 429 (Too many requests) API exception, and resubmit the failed request in a rate-limiting fashion.”,
“Retry” : [ {
“ErrorEquals”: [ “TooManyRequestsException” ],
“IntervalSeconds”: 1,
“MaxAttempts”: 2
} ],
“Catch”: [
{
“ErrorEquals”: [“TooManyRequestsException”],
“Next”: “Wait and Try Later”
}, {
“ErrorEquals”: [“ServerUnavailableException”],
“Next”: “Server Unavailable”
}, {
“ErrorEquals”: [“States.ALL”],
“Next”: “Catch All”
}
]
},
“Wait and Try Later”: {
“Type”: “Wait”,
“Seconds” : 1,
“Next” : “Change to 200”
},
“Server Unavailable”: {
“Type”: “Fail”,
“Error”:”ServerUnavailable”,
“Cause”: “The server is currently unable to handle the request.”
},
“Catch All”: {
“Type”: “Fail”,
“Cause”: “Unknown error!”,
“Error”: “An error of unknown type occurred”
},
“Change to 200”: {
“Type”: “Pass”,
“Result”: {“statuscode” :”200″} ,
“Next”: “Call API”
},
“OK”: {
“Type”: “Pass”,
“Result”: “The request has succeeded.”,
“End”: true
}
}
}
iii. Find the “Resource” line in the “Call API” Task state (line 7). To update this ARN to the ARN of the mock API Lambda function, click on the 'ARN text' and then select the ARN from the list.
iv. Click 'refresh' to have Step Functions create a state machine diagram that corresponds to the workflow. After reviewing the visual workflow, click 'Create state machine'.
Step 4. Test Error Handling Workflow
To test the error handling workflow, we will invoke the state machine to call the mock API by providing the error code as input.
a. Initially, click Start execution.
b. A new execution dialog box appears, where we can enter input for the state machine. We will play the part of the API, and supply the error code that we want the mock API to return.
Replace the existing text with the code below, then choose Start execution:
{
“statuscode”: “200”
}
c. On the 'Execution details' screen, click 'Input' to see the input of the state machine. Next, click 'Output' to view the result of the state machine execution.
On the other hand, we can see that the workflow interpreted statuscode 200 as a successful API call.
d. Under 'Visual workflow', we can see the execution path of each execution. Click on the “Call API” Task state and then expand the Input' and 'Output' fields in the 'Step details' screen.
e. Then, click on the “OK” Task state in the visual workflow. Under 'Step details', we can see that the output of the previous step has been passed as the input to this step.
The 'OK' state is a Pass state, which simply passed its input to its output, performing no work. Pass states are useful when constructing and debugging state machines.
Step 5. Inspect the Execution of the State Machine
a. Scroll to the top of the Execution details screen and click on MyAPIStateMachine.
b. Then we click on Start execution again, and provide the following input and then click Start execution.
{
“statuscode”: “503”
}
c. In the Execution event history section, we expand each execution step to confirm that the workflow behaved as expected.
We will notice that:
i. Step Functions captured the Input
ii. That input was passed to the Call API Task state
iii. Call API Task state called the MockAPIFunction using that input
iv. The MockAPIFunction executed
v. MockAPIFunction failed with a ServerUnavailableException
vi. The catch statement in the Call API Task state caught that exception
vii. The catch statement failed the workflow
viii. State machine completed its execution
d. Then, we will simulate a 429 exception. Scroll to the top of the Execution details screen and click on MyAPIStateMachine.
Provide the following input, and click Start execution:
{
“statuscode”: “429”
}
e. Now we will inspect the retry behavior of the workflow.
In the Execution event history section, expand each execution step once more to confirm that Step Functions tried calling the MockAPILambda function two more times, both of which failed. The workflow transition to the Wait and Try Later state.
Then, the Wait state uses brute force to change the response code to 200, and the workflow completes execution successfully.
f. Run one more instance of the workflow, and this time, provide a random API response that is not handled by the state machine:
{
“statuscode”: “999”
}
Inspect the execution again using the Execution event history. When complete, click on MyAPIStateMachine once more. In the Executions pane, we can see the history of all executions of the workflow.
Step 6. Terminate Resources
In this step, we will terminate the AWS Step Functions and AWS Lambda related resources.
Terminating resources that are not in active use reduces costs and is a best practice.
i. At the top of the AWS Step Functions console window, click on State machines.
ii. Then, click on MyAPIStateMachine and select Delete. Confirm the action by selecting the Delete state machine in the dialog box.
iii. Next, we will delete the Lambda functions. Click Services in the AWS Management Console menu, then select Lambda.
iv. In the Functions screen, click on your MockAPIFunction, select Actions, and then Delete. Confirm the deletion by clicking Delete again.
v. Lastly, we will delete the IAM roles. Click Services in the AWS Management Console menu, then select IAM.
vi. Select both of the IAM roles that we created, then click Delete role. Confirm the delete by clicking Yes, Delete on the dialog box.
We can now sign out of the AWS Management console.
This article will guide you how to handle #errors in Serverless Applications with AWS Step Functions. Combining AWS Step Functions with AWS Lambda makes it simple to orchestrate #AWS #Lambda functions for serverless applications.
To help you deal with errors in Lambda applications, Lambda integrates with services like #Amazon CloudWatch and AWS X-Ray. You can use a combination of logs, metrics, alarms, and tracing to quickly detect and identify issues in your function code, API, or other resources that support your application.
Lambda functions can fail in three cases:
i. An unhandled exception is raised — whether if we received an invalid input, an external API failed, or just a programming bug occurred.
ii. Timeout — Lambda running longer than the configured timeout duration is violently closed with a ‘Task timed out after … seconds’ message. The default value is 6 seconds, and the maximal value is 5 minutes.
iii. Out of memory — In this case, the lambda usually terminates with ‘Process exited before completing request’. The ‘Memory Size’ is equal to ‘Max Memory Used’.