-
Notifications
You must be signed in to change notification settings - Fork 947
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New serverless pattern - s3-lambda-transcribe-sam #1784
Merged
benjasl-stripe
merged 5 commits into
aws-samples:main
from
anushreeumesh:anushreeumesh-feature-s3-lambda-transcribe-sam
Oct 24, 2023
+231
−0
Merged
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
98f8f6f
Initial commit for the s3-lambda-transcribe-sam pattern
8486830
Update s3-lambda-transcribe-sam/example-pattern.json
87c4b4a
Update s3-lambda-transcribe-sam/example-pattern.json
b000db4
Update s3-lambda-transcribe-sam/example-pattern.json
c59bda7
Update s3-lambda-transcribe-sam/example-pattern.json
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,82 @@ | ||
# S3 - Lambda - Transcribe | ||
|
||
This pattern contains a sample AWS Serverless Application Model (SAM) template that deploys a Lambda Function with an S3 object created trigger to start an Amazon Transcribe job and place the results in another S3 bucket. | ||
|
||
This pattern deploys one Lambda Function and two S3 buckets. | ||
|
||
Learn more about this pattern at Serverless Land Patterns: << Add the live URL here >> | ||
|
||
Important: this application uses various AWS services and there are costs associated with these services after the Free Tier usage - please see the [AWS Pricing page](https://aws.amazon.com/pricing/) for details. You are responsible for any AWS costs incurred. No warranty is implied in this example. | ||
|
||
## Services | ||
|
||
The AWS services used in this pattern are | ||
|
||
- Amazon S3 | ||
- AWS Lambda | ||
- Amazon Transcribe | ||
|
||
![Architecture](s3-lambda-transcribe.png) | ||
|
||
## Requirements | ||
|
||
- [Create an AWS account](https://portal.aws.amazon.com/gp/aws/developer/registration/index.html) if you do not already have one and log in. The IAM user that you use must have sufficient permissions to make necessary AWS service calls and manage AWS resources. | ||
- [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html) installed and configured | ||
- [Git Installed](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git) | ||
- [AWS Serverless Application Model](https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/serverless-sam-cli-install.html) (AWS SAM) installed | ||
|
||
## Deployment Instructions | ||
|
||
1. Create a new directory, navigate to that directory in a terminal and clone the GitHub repository: | ||
``` | ||
git clone https://github.com/aws-samples/serverless-patterns | ||
``` | ||
1. Change directory to the pattern directory: | ||
``` | ||
cd s3-lambda-transcribe-sam | ||
``` | ||
1. From the command line, use AWS SAM to build the application: | ||
``` | ||
sam build | ||
``` | ||
1. Use AWS SAM to deploy the AWS resources for the pattern as specified in the template.yml file: | ||
``` | ||
sam deploy --guided | ||
``` | ||
1. During the prompts: | ||
|
||
- Enter a stack name | ||
- Enter the desired AWS Region | ||
- Allow SAM CLI to create IAM roles with the required permissions. | ||
Once you have run `sam deploy --guided` mode once and saved arguments to a configuration file (samconfig.toml), you can use `sam deploy` in future to use these defaults. | ||
|
||
1. Note the outputs from the SAM deployment process. These contain the resource names and/or ARNs which are used for testing. | ||
|
||
## How it works | ||
|
||
The input S3 bucket is configured with an event notification that invokes the Lambda function on upload of the audio file. The lambda reads the file contents from the S3 bucket and sends it to Transcribe for speech-to-text conversion. Transcribe returns an JSON file that contains the speech transcript which is stored in the output S3 bucket. | ||
|
||
## Testing | ||
|
||
1. Upload the audio.mp3 file to the input S3 bucket | ||
```bash | ||
aws s3 cp audio.mp3 s3://speech2text-input-bucket | ||
``` | ||
1. The JSON file with the audio transcript will be uploaded to the output S3 bucket after 2-3 mins. | ||
|
||
## Cleanup | ||
|
||
1. Delete the stack | ||
```bash | ||
aws cloudformation delete-stack --stack-name STACK_NAME | ||
``` | ||
1. Confirm the stack has been deleted | ||
```bash | ||
aws cloudformation list-stacks --query "StackSummaries[?contains(StackName,'STACK_NAME')].StackStatus" | ||
``` | ||
|
||
--- | ||
|
||
Copyright 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved. | ||
|
||
SPDX-License-Identifier: MIT-0 |
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
{ | ||
"title": "S3 - Lambda - Transcribe", | ||
"description": "Trigger a Transcribe job from an S3 upload.", | ||
"language": "Python", | ||
"level": "200", | ||
"framework": "SAM", | ||
"introBox": { | ||
"headline": "How it works", | ||
"text": [ | ||
"This pattern contains a Lambda Function with an S3 object trigger to start an Amazon Transcribe job and place the transcription results into a seperate S3 bucket.", | ||
"This pattern deploys one Lambda Function and two S3 Buckets." | ||
] | ||
}, | ||
"gitHub": { | ||
"template": { | ||
"repoURL": "https://github.com/aws-samples/serverless-patterns/tree/main/s3-lambda-transcribe-sam", | ||
"templateURL": "serverless-patterns/s3-lambda-transcribe-sam", | ||
"projectFolder": "s3-lambda-transcribe-sam", | ||
"templateFile": "template.yaml" | ||
} | ||
}, | ||
"resources": { | ||
"bullets": [ | ||
{ | ||
"text": "Invoke a Lambda Function using an Amazon S3 trigger", | ||
"link": "https://docs.aws.amazon.com/lambda/latest/dg/with-s3-example.html" | ||
}, | ||
{ | ||
"text": "Amazon Transcribe - Speech to Text Conversion", | ||
"link": "https://aws.amazon.com/transcribe/" | ||
} | ||
] | ||
}, | ||
"deploy": { | ||
"text": ["sam deploy --guided"] | ||
}, | ||
"testing": { | ||
"text": ["See the GitHub repo for detailed testing instructions."] | ||
}, | ||
"cleanup": { | ||
"text": ["Delete the stack: <code>sam delete</code>."] | ||
}, | ||
"authors": [ | ||
{ | ||
"name": "Anushree Umesh", | ||
"image": "/assets/images/resources/contributors/umeshanu.jpeg", | ||
"bio": "Anushree Umesh is an Associate Solutions Architect with Amazon Web Services", | ||
"linkedin": "nushreeumesh" | ||
} | ||
] | ||
} |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
import json | ||
import boto3 | ||
import os | ||
import time | ||
|
||
transcribe = boto3.client('transcribe') | ||
|
||
output_bucket = os.environ['OUTPUT_BUCKET'] | ||
|
||
|
||
def lambda_handler(event, context): | ||
try: | ||
for record in event['Records']: | ||
|
||
# Get S3 Object Info | ||
bucket_name = record['s3']['bucket']['name'] | ||
key = record['s3']['object']['key'] | ||
|
||
# Generate Transcription Job Name | ||
job_name = key.split('.')[0] | ||
job_name = job_name + str(int(time.time())) | ||
job_name = job_name[0:199] if len(job_name) >= 200 else job_name | ||
|
||
# Start Transcription Job | ||
response = transcribe.start_transcription_job( | ||
TranscriptionJobName=job_name, | ||
IdentifyLanguage=True, | ||
Media={ | ||
'MediaFileUri': 's3://' + bucket_name + '/' + key, | ||
'RedactedMediaFileUri': 's3://' + bucket_name + '/' + key | ||
}, | ||
OutputBucketName=output_bucket, | ||
OutputKey=job_name + '.json' | ||
) | ||
|
||
return { | ||
'statusCode': 200, | ||
'body': json.dumps(response['TranscriptionJob']['TranscriptionJobName']) | ||
} | ||
|
||
except Exception as e: | ||
print('Error') | ||
print(str(e)) | ||
return { | ||
'statusCode': 500, | ||
'body': str(e) | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
AWSTemplateFormatVersion: '2010-09-09' | ||
Transform: AWS::Serverless-2016-10-31 | ||
Description: Serverless patterns - S3 -> Lambda -> Transcribe | ||
|
||
# More info about Globals: https://github.com/awslabs/serverless-application-model/blob/master/docs/globals.rst | ||
Globals: | ||
Function: | ||
Timeout: 60 | ||
MemorySize: 256 | ||
|
||
Resources: | ||
SpeechToTextFunction: | ||
Type: AWS::Serverless::Function # More info about Function Resource: https://github.com/awslabs/serverless-application-model/blob/master/versions/2016-10-31.md#awsserverlessfunction | ||
Properties: | ||
CodeUri: src/ | ||
Handler: app.lambda_handler | ||
Runtime: python3.11 | ||
Environment: | ||
Variables: | ||
OUTPUT_BUCKET: !Ref OutputBucket | ||
Events: | ||
S3Event: | ||
Type: S3 | ||
Properties: | ||
Bucket: !Ref InputBucket | ||
Events: s3:ObjectCreated:* | ||
Policies: | ||
- AWSLambdaBasicExecutionRole | ||
- Version: '2012-10-17' | ||
Statement: | ||
- Effect: Allow | ||
Action: | ||
- "transcribe:StartTranscriptionJob" | ||
- "s3:ListBucket" | ||
- "s3:GetObject" | ||
- "s3:PutObject" | ||
Resource: "*" | ||
InputBucket: | ||
Type: AWS::S3::Bucket | ||
Properties: | ||
BucketName: speech2text-input-bucket # Replace this with a unique name. For more info about S3 bucket names: https://docs.aws.amazon.com/AmazonS3/latest/userguide/bucketnamingrules.html | ||
OutputBucket: | ||
Type: AWS::S3::Bucket | ||
Properties: | ||
BucketName: speech2text-output-bucket # Replace this with a unique name | ||
|
||
Outputs: | ||
STOutputBucket: | ||
Description: "The output bucket with the audio transcript file" | ||
Value: | ||
Ref: OutputBucket |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"linkedin": "anushreeumesh"