In today’s fast-paced digital world, extracting text from audio files can save valuable time and enhance productivity. AWS Transcribe is a powerful tool that efficiently converts audio to text. This blog will guide you through setting up an automated pipeline using AWS Transcribe, Lambda, and S3 to extract text from audio files and save the output to an S3 bucket. Additionally, we’ll store metadata in DynamoDB.

Prerequisites

  1. AWS Account: If you don’t have one, create an AWS account.
  2. AWS CLI: Install and configure the AWS CLI.
  3. AWS IAM Permissions: Ensure you have the necessary permissions to create and manage S3 buckets, Lambda functions, and DynamoDB tables.

Step 1: Create S3 Buckets

First, create two S3 buckets for input audio files and the transcribed text output.

aws s3 mb s3://input-audio-bucket

aws s3 mb s3://output-transcribe-bucket

Step 2: Set Up AWS Transcribe

AWS Transcribe will convert audio files to text. We will trigger the transcription process using a Lambda function.

Step 3: Create a DynamoDB Table

Create a DynamoDB table to store metadata.

aws dynamodb create-table \

    –table-name TranscribeMetadata \

    –attribute-definitions AttributeName=JobId,AttributeType=S \

    –key-schema AttributeName=JobId,KeyType=HASH \

    –provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=5

Step 4: Create an IAM Role for Lambda

Create an IAM role with the necessary permissions for the Lambda function.

{

  “Version”: “2012-10-17”,

  “Statement”: [

    {

      “Effect”: “Allow”,

      “Action”: [

        “s3:GetObject”,

        “s3:PutObject”,

        “transcribe:StartTranscriptionJob”,

        “dynamodb:PutItem”

      ],

      “Resource”: “*”

    }

  ]

}

Step 5: Create the Lambda Function

Create a Lambda function to handle the transcription process.

  1. Function Code:

import json

import boto3

import os

def lambda_handler(event, context):

    transcribe = boto3.client(‘transcribe’)

    s3 = boto3.client(‘s3’)

    dynamodb = boto3.client(‘dynamodb’)

    

    bucket = event[‘Records’][0][‘s3’][‘bucket’][‘name’]

    key = event[‘Records’][0][‘s3’][‘object’][‘key’]

    job_name = key.split(‘.’)[0]

    job_uri = f’s3://{bucket}/{key}’

    

    transcribe.start_transcription_job(

        TranscriptionJobName=job_name,

        Media={‘MediaFileUri’: job_uri},

        MediaFormat=key.split(‘.’)[-1],

        LanguageCode=’en-US’,

        OutputBucketName=os.environ[‘OUTPUT_BUCKET’]

    )

    

    dynamodb.put_item(

        TableName=os.environ[‘DYNAMODB_TABLE’],

        Item={

            ‘JobId’: {‘S’: job_name},

            ‘MediaFileUri’: {‘S’: job_uri},

            ‘Status’: {‘S’: ‘IN_PROGRESS’}

        }

    )

    

    return {

        ‘statusCode’: 200,

        ‘body’: json.dumps(‘Transcription job started’)

    }

  1. Environment Variables:
  • OUTPUT_BUCKET: output-transcribe-bucket
  • DYNAMODB_TABLE: TranscribeMetadata
  1. Trigger:

Add an S3 trigger for the input bucket to invoke the Lambda function on ObjectCreated events.

Step 6: Monitor Transcription and Update Metadata

Create another Lambda function to update the metadata once the transcription is complete.

import json

import boto3

def lambda_handler(event, context):

    transcribe = boto3.client(‘transcribe’)

    dynamodb = boto3.client(‘dynamodb’)

    

    job_name = event[‘detail’][‘TranscriptionJobName’]

    status = event[‘detail’][‘TranscriptionJobStatus’]

    

    dynamodb.update_item(

        TableName=os.environ[‘DYNAMODB_TABLE’],

        Key={‘JobId’: {‘S’: job_name}},

        UpdateExpression=”set #s = :s”,

        ExpressionAttributeNames={‘#s’: ‘Status’},

        ExpressionAttributeValues={‘:s’: {‘S’: status}}

    )

    

    return {

        ‘statusCode’: 200,

        ‘body’: json.dumps(‘Transcription job status updated’)

    }

Set this function to be triggered by AWS CloudWatch Events for TranscriptionJobStateChange.

Conclusion

By following these steps, you can automate extracting text from audio files using AWS Transcribe and save the output to an S3 bucket. Additionally, storing metadata in DynamoDB allows you to keep track of the transcription jobs efficiently.