-
Notifications
You must be signed in to change notification settings - Fork 947
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New serverless pattern - Serverless Messaging Redrive #2543
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to include a README.md with a detailed description of what this sample is, as well as deployment, testing, and deletion instructions.
You also need to include the *pattern.json
file as per the sample adapted to your needs.
If any of the feedback / comments aren't clear, feel free to DM me offline to discuss and I can help point you in the right direction.
import logging | ||
|
||
# Set up logging | ||
logger = logging.getLogger() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an opportunity to use the PowerTools logging module here instead of Python native?
|
||
def can_fix_message(message): | ||
""" | ||
Determine if a message can be fixed automatically. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you include a sample scenario for customers to reference as a model? That could be "bad character replacement", "downstream system temporarily unavailable", or anything else you choose.
MAIN_QUEUE_URL = os.environ['MAIN_QUEUE_URL'] | ||
FATAL_DLQ_URL = os.environ['FATAL_DLQ_URL'] | ||
|
||
def can_fix_message(message): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're using a function to determine if a message can be fixed or not, then as part of its return you need to know which "fix strategy" to use, and this needs to be a parameter for your fix_message
function too doesn't it so that it knows which fix to apply.
|
||
def fix_message(message): | ||
""" | ||
Apply fixes to the message. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as on the other function, the wording needs to be expanded to help people with zero context understand what's meant to be happening here.
logger.error(f"Validation error: {str(e)}") | ||
return False | ||
|
||
def fix_message(message): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function definition needs to be expanded to include a fix strategy that you fork to within the function depending on what type of "fix" is needed, whether that's character replacement, a delay, etc.
@@ -0,0 +1,2 @@ | |||
boto3==1.26.137 | |||
jsonschema==4.17.3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this being used directly anywhere?
@@ -0,0 +1,121 @@ | |||
import json | |||
import logging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use PowerTools logging implementation?
@@ -0,0 +1,2 @@ | |||
boto3==1.26.137 | |||
jsonschema==4.17.3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this being used directly?
Runtime: python3.9 | ||
CodeUri: ./functions/processor/ | ||
MemorySize: 128 | ||
Timeout: 29 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this always going to be adequate as this scales out? Consider parameterising this value (for both functions) and including more comments about the when it's declared at the top of this YAML doc describing when people might want to increase it.
Runtime: python3.9 | ||
CodeUri: ./functions/decision_maker/ | ||
MemorySize: 128 | ||
Timeout: 29 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment about parameterisation as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zalilias you need to rename the file to be in line with the name of your pattern and replace the contents of the template / example with information about your pattern.
- Extensible processing logic | ||
|
||
## API Reference | ||
# Send Message |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea to include a copy of the API contract, but It would be nice to see commands that users can use in their terminal as an example curl https://${endpoint}/message....
with the relevant params to send a sample message.
4. Failed messages route to DLQs | ||
5. Decision maker attempts an automated recovery | ||
|
||
## Deployment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likely to come up in @ellisms review, but we typically include a statement on users being responsible for spinning up / tearing down / paying for resources in their own AWS accounts. We'd also typically talk about having the AWS CLI installed and configured as a pre-requisite to using SAM. Take a look at some other SAM examples to get a feel for the norm.
## Basic Flow | ||
1. Messages enter through API Gateway | ||
2. Main queue receives messages | ||
3. Processor Lambda handles messages |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would include that this Lambda function polls the main queue using Event Source Mappings (ESMs) and handles the messages. Include links to docs / blogs / posts where they can build understanding.
1. Messages enter through API Gateway | ||
2. Main queue receives messages | ||
3. Processor Lambda handles messages | ||
4. Failed messages route to DLQs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're going to use terse statements in a numbered or bulleted list, it would be good to include a more full explanation alongside them. In this context what's a "failed message"? See if you can go a level deeper to explain what's going on as it may be confusing to somebody unfamiliar with the pattern or certain service features, like DLQ.
@@ -2,6 +2,68 @@ AWSTemplateFormatVersion: '2010-09-09' | |||
Transform: AWS::Serverless-2016-10-31 | |||
Description: Serverless message processing architecture |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This description doesn't match what you're actually doing - this isn's just about message processing, it's about attempted automated failure re-drive.
- Performing CPU-intensive operations | ||
- Seeing timeout errors or slow processing | ||
- Handling concurrent message processing | ||
Recommended ranges: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "heavy" mean to the user? Isn't this ultimately dependent on how computationally expensive the operation would be? Perhaps in the context of a demo pattern think instead of suggesting 2 values with the lower as the default given it's just a sample and not meant to be production grade.
There would be an expectation that if a customer wanted to fork this and use it themselves that they'd do their own due diligence on memory sizing for their use case. Likely using Lambda Power Tuning.
MinValue: 128 | ||
MaxValue: 10240 | ||
|
||
DecisionMakerMemorySize: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comments as for ProcessorMemorySize
MinValue: 1 | ||
MaxValue: 900 | ||
|
||
DecisionMakerTimeout: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This parameter has all of the same constraints as the one above for the same reasons (they're Lambda functions). I'd collapse in to one parameter called LambdaTimeout
and make use of it for both functions. Make sure to update the first line of the description.
@@ -115,8 +178,9 @@ Resources: | |||
Handler: app.lambda_handler | |||
Runtime: python3.9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Python 3.9 is 4 years old now, should we not be defaulting to the current LTS as this is a newly published asset? 3.12 seems like a reasonable default.
Description of changes:
A serverless AWS solution for processing and fixing malformed messages using SQS queues and Lambda functions. This system automatically handles message validation, applies fixes where possible, and routes messages to appropriate queues based on their fixability. With built-in error handling and detailed logging, it provides a robust framework for message processing that can be easily extended for specific business needs.
The architecture uses AWS Lambda for processing, multiple SQS queues for message routing, and includes a dead-letter queue (DLQ) for messages requiring human intervention. Built with AWS SAM.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.