In today’s fast-paced digital world, extracting text from audio files can save valuable time and enhance productivity. AWS Transcribe is a powerful tool that efficiently converts audio to text. This blog will guide you through setting up an automated pipeline using AWS Transcribe, Lambda, and S3 to extract text from audio files and save the output to an S3 bucket. Additionally, we’ll store metadata in DynamoDB.
Prerequisites
- AWS Account: If you don’t have one, create an AWS account.
- AWS CLI: Install and configure the AWS CLI.
- AWS IAM Permissions: Ensure you have the necessary permissions to create and manage S3 buckets, Lambda functions, and DynamoDB tables.
Step 1: Create S3 Buckets
First, create two S3 buckets for input audio files and the transcribed text output.
aws s3 mb s3://input-audio-bucket
aws s3 mb s3://output-transcribe-bucket
Step 2: Set Up AWS Transcribe
AWS Transcribe will convert audio files to text. We will trigger the transcription process using a Lambda function.
Step 3: Create a DynamoDB Table
Create a DynamoDB table to store metadata.
aws dynamodb create-table \
–table-name TranscribeMetadata \
–attribute-definitions AttributeName=JobId,AttributeType=S \
–key-schema AttributeName=JobId,KeyType=HASH \
–provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=5
Step 4: Create an IAM Role for Lambda
Create an IAM role with the necessary permissions for the Lambda function.
{
“Version”: “2012-10-17”,
“Statement”: [
{
“Effect”: “Allow”,
“Action”: [
“s3:GetObject”,
“s3:PutObject”,
“transcribe:StartTranscriptionJob”,
“dynamodb:PutItem”
],
“Resource”: “*”
}
]
}
Step 5: Create the Lambda Function
Create a Lambda function to handle the transcription process.
- Function Code:
import json
import boto3
import os
def lambda_handler(event, context):
transcribe = boto3.client(‘transcribe’)
s3 = boto3.client(‘s3’)
dynamodb = boto3.client(‘dynamodb’)
bucket = event[‘Records’][0][‘s3’][‘bucket’][‘name’]
key = event[‘Records’][0][‘s3’][‘object’][‘key’]
job_name = key.split(‘.’)[0]
job_uri = f’s3://{bucket}/{key}’
transcribe.start_transcription_job(
TranscriptionJobName=job_name,
Media={‘MediaFileUri’: job_uri},
MediaFormat=key.split(‘.’)[-1],
LanguageCode=’en-US’,
OutputBucketName=os.environ[‘OUTPUT_BUCKET’]
)
dynamodb.put_item(
TableName=os.environ[‘DYNAMODB_TABLE’],
Item={
‘JobId’: {‘S’: job_name},
‘MediaFileUri’: {‘S’: job_uri},
‘Status’: {‘S’: ‘IN_PROGRESS’}
}
)
return {
‘statusCode’: 200,
‘body’: json.dumps(‘Transcription job started’)
}
- Environment Variables:
- OUTPUT_BUCKET: output-transcribe-bucket
- DYNAMODB_TABLE: TranscribeMetadata
- Trigger:
Add an S3 trigger for the input bucket to invoke the Lambda function on ObjectCreated events.
Step 6: Monitor Transcription and Update Metadata
Create another Lambda function to update the metadata once the transcription is complete.
import json
import boto3
def lambda_handler(event, context):
transcribe = boto3.client(‘transcribe’)
dynamodb = boto3.client(‘dynamodb’)
job_name = event[‘detail’][‘TranscriptionJobName’]
status = event[‘detail’][‘TranscriptionJobStatus’]
dynamodb.update_item(
TableName=os.environ[‘DYNAMODB_TABLE’],
Key={‘JobId’: {‘S’: job_name}},
UpdateExpression=”set #s = :s”,
ExpressionAttributeNames={‘#s’: ‘Status’},
ExpressionAttributeValues={‘:s’: {‘S’: status}}
)
return {
‘statusCode’: 200,
‘body’: json.dumps(‘Transcription job status updated’)
}
Set this function to be triggered by AWS CloudWatch Events for TranscriptionJobStateChange.
Conclusion
By following these steps, you can automate extracting text from audio files using AWS Transcribe and save the output to an S3 bucket. Additionally, storing metadata in DynamoDB allows you to keep track of the transcription jobs efficiently.