Managing emails effectively is crucial for both personal and business communications. With the rise of cloud storage solutions like AWS S3, it has become increasingly popular to archive and backup emails for long-term storage and easy access. In this blog post, we will walk you through the steps to automatically ingest incoming Gmails to AWS S3. This process will ensure your emails are safely stored and easily retrievable.

Prerequisites

Before we start, make sure you have the following:

  1. AWS Account: Sign up for an AWS account if you don’t have one.
  2. AWS CLI: Install and configure the AWS CLI on your machine.
  3. Gmail Account: Ensure you have access to the Gmail account you want to back up.
  4. Google Cloud Platform (GCP) Account: Sign up and enable the Gmail API.

Step-by-Step Guide

Step 1: Enable Gmail API on Google Cloud Platform

  1. Go to the Google Cloud Console.
  2. Create a new project or select an existing project.
  3. Navigate to the API & Services dashboard.
  4. Click Enable APIs and Services and search for Gmail API.
  5. Enable the Gmail API for your project.

Step 2: Create OAuth 2.0 Credentials

  1. In the API & Services dashboard, go to Credentials.
  2. Click Create Credentials and select OAuth 2.0 Client IDs.
  3. Configure the consent screen if prompted.
  4. Create the OAuth client ID, selecting Web application.
  5. Add authorized redirect URIs (e.g., http://localhost for testing purposes).
  6. Download the credentials file (JSON format).

Step 3: Set Up AWS S3 Bucket

  1. Log in to your AWS Management Console.
  2. Navigate to S3 and create a new bucket.
  3. Configure bucket settings as needed (e.g., region, access permissions).
  4. Note the bucket name and region.

Step 4: Write a Script to Ingest Emails to S3

Create a Python script to read incoming emails and upload them to S3.

import os

import base64

import boto3

from googleapiclient.discovery import build

from google_auth_oauthlib.flow import InstalledAppFlow

from google.auth.transport.requests import Request

# AWS S3 Configuration

s3_bucket_name = ‘your-s3-bucket-name’

s3_region_name = ‘your-s3-region’

s3_client = boto3.client(‘s3’, region_name=s3_region_name)

# Gmail API Configuration

SCOPES = [‘https://www.googleapis.com/auth/gmail.readonly’]

def authenticate_gmail():

    creds = None

    if os.path.exists(‘token.json’):

        creds = Credentials.from_authorized_user_file(‘token.json’, SCOPES)

    if not creds or not creds.valid:

        if creds and creds.expired and creds.refresh_token:

            creds.refresh(Request())

        else:

            flow = InstalledAppFlow.from_client_secrets_file(‘credentials.json’, SCOPES)

            creds = flow.run_local_server(port=0)

        with open(‘token.json’, ‘w’) as token:

            token.write(creds.to_json())

    service = build(‘gmail’, ‘v1’, credentials=creds)

    return service

def save_email_to_s3(email_content, email_id):

    s3_client.put_object(

        Bucket=s3_bucket_name,

        Key=f’emails/{email_id}.txt’,

        Body=email_content

    )

def fetch_and_upload_emails():

    service = authenticate_gmail()

    results = service.users().messages().list(userId=’me’, labelIds=[‘INBOX’]).execute()

    messages = results.get(‘messages’, [])

    if not messages:

        print(‘No messages found.’)

    else:

        for message in messages:

            msg = service.users().messages().get(userId=’me’, id=message[‘id’]).execute()

            email_data = base64.urlsafe_b64decode(msg[‘raw’].encode(‘ASCII’))

            save_email_to_s3(email_data, message[‘id’])

if __name__ == ‘__main__’:

    fetch_and_upload_emails()

Step 5: Schedule the Script to Run Automatically

To ensure the script runs automatically, you can set up a cron job (Linux/macOS) or Task Scheduler (Windows).

Linux/macOS
  1. Open the terminal and type crontab -e.

Add a new cron job entry to run the script at your desired interval.

0 * * * * /usr/bin/python3 /path/to/your/script.py

Windows
  1. Open Task Scheduler and create a new task.
  2. Set the trigger to your desired schedule.
  3. Set the action to run your Python script.

Conclusion

By following these steps, you can automate the process of ingesting incoming Gmails to AWS S3, ensuring your emails are securely backed up. This method leverages the power of Google APIs and AWS services to provide a robust email archiving solution.