Step Functions re:Invented

Testing out the new Step Function features announced at AWS re:Invent 2023

Matthew Venne
ITNEXT

--

re:Invent 2023 is officially in the books! I was in attendance and I can assure you I was wearing the Gold Jacket proudly all weekend long. Luckily for you all, I’ll spare you a full re:Cap this time around.

(Left)Here I am with AWS Ambassador and Gold Jacket owner, Sammy Cheung. (Right) We tried to get all of the Golden Jacket owners who were in attendance for a group photograph. Let’s make it twice as big next year! the focus of this year’s re:Invent was undoubtedly how AWS is expanding their AI services — primarily Generative AI. That is not to say that there were not significant non-AI announcements. I’ve blogged previously about Step Functions and two of the new features that were announced regarding Step Functions caught my eye:
  1. Native Step Function Action to make HTTP Request to Third Party API. — No need to write Lambda functions to do this anymore — for most use cases.
  2. Test State: You can now individual test a state without running the entire state machine!

This came at a perfect time as I was in the process of writing a State Machine for one of my customers. We are in the process of a Cloud Endure to AWS Elastic Disaster Recovery upgrade since Cloud Endure is EOL very soon. Stay tuned for another blog regarding this once we are done.

One of their requirements was their Recovery VPC should have semi-isolated (bubble) subnets for testing— the idea being it should not have full routability to their on-prem network. Some of their workloads were Windows and restoring a Windows VM could cause AD to recognize the newly restored server and not the source server. However, those servers would need to route to Entra ID (previously Azure AD) since the applications on those servers leveraged Entra ID Authentication.

Luckily, Microsoft publishes their IP range in a publicly accessibly URL. However, the IP list is large (over 100 different CIDRs) and it changes periodically.

I immediately thought the new Third Party API HTTP integration with Step Functions would be a great way to programmatically grab those URLs.

A few comments on the new Third Party API HTTP integration:

  1. It leverages EventBridge Destinations (previously introduced to store API authentication information, for EventBridge third party API access).
  2. This stores the creds in a secret in Secrets Manager — however, this secret is free.
  3. This Event Bridge destination is required even for unauthenticated APIs like this particular use-case with Microsoft. I just put dummy values for the user name and password.
Note: It would nice if they made the EventBridge Destination optional for unauthenticated API, since it requires unnecessary resources and more IAM permissions on the StateMachine Role.

Ultimately, I was not able to leverage the Third Party API Integration with Step Functions since it ran into the upper limit of Task Result size (256 KB) that my API Request was apparentty hitting. The response was 3.3 MB. Unfortunately, there was no way to filter the response on the HTTP request task. So I had to just a write a Lambda function. It would be nice if Step Functions enabled you to filter the JSON that is returned.

  Lambda:
Type: AWS::Lambda::Function
Properties:
Role: !GetAtt LambdaRole.Arn
Timeout: 300
Handler: index.lambda_handler
Runtime: python3.10
Environment:
Variables:
URL: !Ref URLParameterName
Code:
ZipFile: |
import boto3,os,urllib,json
ssm=boto3.client("ssm")
mainurl=ssm.get_parameter(Name=os.environ['URL'])['Parameter']['Value']

def lambda_handler(event, context):
response=urllib.request.urlopen(mainurl).read()
filter=json.loads(response.decode('utf-8'))['values']
for i in filter:
if i['name'] == 'AzureActiveDirectory':
result=i['properties']['addressPrefixes']
return result

However, the Test State was much more useful this time around. You can test out all of your Input and Output processing in the state as well as your actual parameters. You can even specify a State input to test specific scenarios.

Here is the screen showing the Task output from the test. You can expand the JSON nodes as needed.
Here is the screen showing the Input/Output processing logic. You can expand the JSON nodes as needed.

Here is the final diagram of the flow chart.

  1. It starts by creating the new route table for the bubble subnets. This value must be passed down to the Parallel state since this new Route Table needs the routes added and needs to be associated with the proper subnets.
  2. It then queries the Azure AD IPs — the lambda was able to filter the returned response.
  3. It passes this list to the the parallel state which has two branches:

1. Branch 1

  • Loop through the CIDRs and add them to the route table created in the first step. This value was passed down from the previous step using ItemSelector to define a custom object for looping. Each iteration required the static value of the new Route Table but also the dynamic value of the CIDR block.
  • I used the new-ish error handling features of Step Functions to account for the different parameter names for IPv4 and IPv6 CIDRs — rather than trying to parse them myself.
  • We had to update the Service Quota for Routes per Route Table since the default is only 50. This could have performance implications for traffic into and out of the subnets associated with this larger route table — since the VPC router will have more routes to parse.

2. Branch 2

  • Wait 5 seconds so it does not associate the new route table prior to the Routes being added.
  • I have the existing Bubble route table ID stored as an SSM parameter, so I query that.
  • I then describe this route table to get all of the associations for that route table so I can iterate over them with a Map state.
  • This was interesting I had never directly run the ReplaceRouteTableAssociation API Action. I originally thought I would use the CreateRouteTableAssociation and it would overwrite the old association but this is not the case. The ReplaceRouteTableAssociation is what was needed and requires the existing AssociationId (one per subnet) and the new RouteTableId. The new Route Table ID was passed down from a previous step using ItemSelector to define a custom object for looping.

4. Next it grabs the SSM Parameter storing the existing route table ID. I didn’t pass the previous GetParameter for this downstream since it would nested in a MapState and Parallel State. Would love global variables for this…

5. It then runs another parallel action

  1. Branch 1
  • Update the SSM parameter for the existing route table to store the value of the route table created in this iteration. This is so subsequent executions of the State Machine have no issues.

2. Branch 2

  • Delete the old route table.
  • I elected for the route table swap rather than overwriting the existing route table so I didn’t have to delete invalid routes and have error logic for routes that are still valid.

Here is the full definition of my State Machine in CloudFormation. Side note: love the new intrinisic CFN function JsonToString!

  StateMachine:
Type: AWS::StepFunctions::StateMachine
Properties:
StateMachineName: AzureADRouteUpdate
RoleArn: !GetAtt StateMachineRole.Arn
DefinitionString:
Fn::ToJsonString:
Comment: A description of my state machine
StartAt: CreateNewBubbleRouteTable
States:
CreateNewBubbleRouteTable:
Type: Task
Next: Query Azure AD IPs
Parameters:
VpcId: !Ref VPC
TagSpecifications:
- ResourceType: route-table
Tags:
- Key: Name
Value: !Sub ${VPCName}-bubble-rt
Resource: arn:aws:states:::aws-sdk:ec2:createRouteTable
ResultSelector:
NewRouteTable.$: $.RouteTable.RouteTableId
Query Azure AD IPs:
Type: Task
Resource: arn:aws:states:::lambda:invoke
Parameters:
FunctionName: arn:aws:lambda:us-east-1:******:function:routing-statemachine-Lambda:$LATEST
Payload.$: $
Retry:
- ErrorEquals:
- Lambda.ServiceException
- Lambda.AWSLambdaException
- Lambda.SdkClientException
- Lambda.TooManyRequestsException
IntervalSeconds: 1
MaxAttempts: 3
BackoffRate: 2
ResultPath: $.Result
ResultSelector:
CIDRs.$: $.Payload
Next: Parallel
Parallel:
Type: Parallel
Next: GetParameter
Branches:
- StartAt: Loop Through IPs
States:
Loop Through IPs:
Type: Map
ItemProcessor:
ProcessorConfig:
Mode: INLINE
StartAt: CreateIPv4Route
States:
CreateIPv4Route:
Type: Task
End: true
Parameters:
RouteTableId.$: $.NewRouteTable
DestinationCidrBlock.$: $.CIDRs
GatewayId: !Ref TGW
Resource: arn:aws:states:::aws-sdk:ec2:createRoute
Catch:
- ErrorEquals:
- States.TaskFailed
Next: CreateIPv6Route
ResultPath: $.Input
CreateIPv6Route:
Type: Task
Parameters:
RouteTableId.$: $.NewRouteTable
DestinationIpv6CidrBlock.$: $.CIDRs
GatewayId: !Ref TGW
Resource: arn:aws:states:::aws-sdk:ec2:createRoute
End: true
ItemSelector:
NewRouteTable.$: $.NewRouteTable
CIDRs.$: $$.Map.Item.Value
ItemsPath: $.Result.CIDRs
End: true
ResultPath: $.MapResult
OutputPath: $.NewRouteTable
- StartAt: Wait
States:
Wait:
Type: Wait
Seconds: 5
Next: GetRouteTable
GetRouteTable:
Type: Task
Next: DescribeRouteTables
Parameters:
Name: bubble-rt
Resource: arn:aws:states:::aws-sdk:ssm:getParameter
ResultSelector:
ExistingRouteTable.$: $.Parameter.Value
ResultPath: $.Result
DescribeRouteTables:
Type: Task
Next: Map
Parameters:
RouteTableIds.$: States.Array($.Result.ExistingRouteTable)
Resource: arn:aws:states:::aws-sdk:ec2:describeRouteTables
ResultPath: $.Output
Map:
Type: Map
ItemProcessor:
ProcessorConfig:
Mode: INLINE
StartAt: ReplaceRouteTableAssociation
States:
ReplaceRouteTableAssociation:
Type: Task
Parameters:
AssociationId.$: $.RouteTableAssociationId
RouteTableId.$: $.NewRouteTable
Resource: arn:aws:states:::aws-sdk:ec2:replaceRouteTableAssociation
End: true
End: true
ItemsPath: $.Output.RouteTables[0].Associations
ItemSelector:
NewRouteTable.$: $.NewRouteTable
RouteTableAssociationId.$: $$.Map.Item.Value.RouteTableAssociationId
ResultPath: $.MapResult
ResultSelector:
ValueEnteredInForm: ''

Final Thoughts:

Just summarizing some things I’d like to see next from the Step Functions Team.

  1. Create Global Variables!!! It would nice if we could store values from previous tasks as a global variable rather than having to pass them down through several tasks.
  2. Make EventBridge Destinations optional for the new Third Party API HTTP integration for unauthenticated APIs.
  3. Add capability to filter the response of the Third Party API HTTP integration. I’m not sure which libraries are used under the hood to invoke the HTTP action but I can’t imagine the logic would be that difficult to pass it a JSONPath to filter the response.
  4. Great Job on the Test State feature!!!

--

--

Writer for

Matthew is Managing Director at stackArmor — a leading cloud security and compliance partner. All opinions expressed are his own. . stackarmor.com