Introduction to File Information Extraction with Python

Extracting file information is a common task in various programming scenarios. Whether managing files on a local machine or working with files on a server, gathering detailed information about each file is crucial. With its powerful standard library, Python makes this task straightforward and efficient. This guide will walk you through setting up a development environment, writing a Python script to extract file information, and executing the script to view the results.

Setting Up Your Development Environment

Before we dive into writing the script, setting up a proper development environment is essential. Here are the steps:

  1. Install Python: Ensure you have Python installed on your machine. You can download it from the official Python website.
  2. Install an IDE: Choose an Integrated Development Environment (IDE) like VS Code, PyCharm, or any text editor you prefer.

Creating a Code Repository

Creating a code repository helps you manage and version control your project. Follow these steps:

  1. Sign up for GitHub: If you don’t have an account, create one on GitHub.
  2. Create a new repository: Go to GitHub and create a new repository. Give it a name like file-info-extractor.

Connecting Your Repository to an IDE

To streamline your development workflow, connect your repository to your IDE. Here’s how you can do it in VS Code:

  1. Clone your repository: Open VS Code and clone your GitHub repository using the command palette (Ctrl+Shift+P) and selecting Git: Clone.
  2. Open the cloned repository: Navigate and open it in VS Code.

Writing the Python Script

Now that your environment is set up let’s write the script. Open a new Python file, name it file_info_extractor.py, and start coding.

Importing the ‘os’ Module

The os module in Python provides a way to interact with the operating system. It will help us access file information. Add the following line to your script:

import os

Gathering File Information

To gather file information, we will use the os module functions. Here’s a function that retrieves file details:

def get_file_info(file_path):

    file_info = os.stat(file_path)

    return {

        ‘file_name’: os.path.basename(file_path),

        ‘file_size’: file_info.st_size,

        ‘last_modified’: file_info.st_mtime,

        ‘last_accessed’: file_info.st_atime,

        ‘creation_time’: file_info.st_ctime

    }

Creating a Dictionary of File Details

Let’s create a function that iterates through a directory and gathers information for each file:

def gather_files_info(directory):

    files_info = []

    for root, dirs, files in os.walk(directory):

        for file in files:

            file_path = os.path.join(root, file)

            files_info.append(get_file_info(file_path))

    return files_info

Testing Your Script

To ensure our script works, we need to write some test cases. However, for simplicity, we will directly run the script and check the output.

Generating Sample Files

To test the script, create a few sample files in a directory. You can make these files manually or using a script.

Executing the Script from the Terminal

Here’s how you can execute the script from the terminal:

  1. Navigate to your project directory: Use the terminal to navigate the directory containing file_info_extractor.py.
  1. Run the script: Execute the script by running:

    python file_info_extractor.py

Success: Viewing Your Extracted File Information

You should see the extracted file information printed in the terminal if everything is set up correctly. Modify the gather_files_info function to print the results:

if __name__ == “__main__”:

    directory = input(“Enter the directory to scan: “)

    files_info = gather_files_info(directory)

    for info in files_info:

        print(info)

This script will prompt you for a directory, scan it, and print detailed information about each file.

Conclusion

Congratulations! You’ve successfully written a Python script to extract file information. This script can be extended and modified for more complex tasks, such as filtering files by type or size or saving the information to a file or database. Python’s versatility makes it an excellent choice for file management tasks.

References

Tutorial: Get started using the AWS SDK for Python (Boto3)

Extract text and structured data with Amazon Textract