Introduction: Project Overview and Goals
Heart disease is a leading cause of death worldwide, making its early detection critical. In this project, we will build and deploy a machine-learning model to predict heart disease. Our model will be hosted on an AWS EC2 instance, with a CI/CD pipeline automating the deployment process. This project aims to provide a scalable, user-friendly prediction interface accessible via a web application.
Environment Setup: Creating a Git Repository and Virtual Environment
- Create a Git Repository:
- Initialize a new Git repository on GitHub to manage your project code.
- Clone the repository to your local machine.
- Setup Virtual Environment:
- Install virtualenv if you haven’t already: pip install virtualenv
- Create a virtual environment: virtualenv venv
- Activate the virtual environment:
- On Windows: .\venv\Scripts\activate
- On MacOS/Linux: source venv/bin/activate
- Install necessary dependencies: pip install -r requirements.txt
MLFlow Integration with Dagshub: Tracking Experiments and Logging Results
Integrate MLFlow with Dagshub to track your machine-learning experiments and log results.
- Install MLFlow and Dagshub:
- pip install mlflow dagshub
- Configure MLFlow:
Set up MLFlow to log experiments to Dagshub by updating your configuration file:
from dagshub import DAGsHubLogger
with DAGsHubLogger() as logger:
mlflow.log_param(“param1”, “value1”)
mlflow.log_metric(“metric1”, 0.5)
- Track Experiments:
- Use MLFlow to track your experiments and log parameters, metrics, and models.
Prediction Pipeline Design: Stages of Data Processing and Model Building
Design a prediction pipeline consisting of the following stages:
- Data Collection and Cleaning:
- Collect data from relevant sources.
- Clean the data by handling missing values, outliers, and normalizing features.
- Feature Engineering:
- Create new features from existing data to improve model performance.
- Model Training:
- Split the data into training and testing sets.
- Train various machine learning models and evaluate their performance.
- Model Selection and Validation:
- Select the best-performing model based on evaluation metrics.
- Validate the model using cross-validation techniques.
Implementation Details: Configuration Files and Pipeline Code Structure
- Configuration Files:
- Create configuration files for different environments (development, testing, production).
- Store configuration settings include database connections, API keys, and model parameters.
- Pipeline Code Structure:
- Organize your code into data processing, feature engineering, model training, and evaluation modules.
- Use scripts for training and predicting to ensure reproducibility.
GitHub Actions for CI/CD: Automating Build and Deployment Processes
Set up GitHub Actions to automate the build and deployment process.
- Create a GitHub Actions Workflow:
- Add a workflow YAML file to your repository’s .github/workflows directory.
- Define jobs for testing, building, and deploying the application.
name: CI/CD Pipeline
on: [push]
jobs:
build:
runs-on: ubuntu-latest
steps:
– uses: actions/checkout@v2
– name: Set up Python
uses: actions/setup-python@v2
with:
python-version: ‘3.8’
– name: Install dependencies
run: |
python -m pip install –upgrade pip
pip install -r requirements.txt
– name: Run tests
run: |
pytest
– name: Deploy to AWS EC2
run: |
ssh -i “your-key.pem” ec2-user@your-ec2-instance “docker pull your-docker-image && docker run -d your-docker-container”
UI Development with Flask: Designing a User-Friendly Prediction Interface
- Set Up Flask:
- Install Flask: pip install flask
Create a basic Flask app to serve the prediction model:
from flask import Flask, request, jsonify
app = Flask(__name__)
@app.route(‘/predict’, methods=[‘POST’])
def predict():
data = request.get_json(force=True)
prediction = model.predict([data[‘input’]])
return jsonify({‘prediction’: prediction[0]})
if __name__ == ‘__main__’:
app.run(debug=True)
- Develop the Frontend:
- Create HTML templates and static files for the user interface.
- Use Bootstrap or another frontend framework to enhance the UI.
AWS Configuration: Creating IAM User, ECR Repository, and EC2 Instance
- Create IAM User:
- Create an IAM user with permission to access ECR and EC2.
- Set Up ECR Repository:
- Create an ECR repository to store Docker images.
- Launch EC2 Instance:
- Launch an EC2 instance to host your application.
- Configure security groups to allow necessary inbound and outbound traffic.
Deployment on EC2: Installing Docker, Configuring Security Groups, and Setting up Runner
- Install Docker:
Install Docker on your EC2 instance:
sudo yum update -y
sudo amazon-linux-extras install docker
sudo service docker start
sudo usermod -a -G docker ec2-user
- Configure Security Groups:
- Update security groups to allow HTTP/HTTPS traffic.
- Set Up Runner:
- Use GitHub Actions Runner to connect your EC2 instance to GitHub for automated deployments.
GitHub Secrets and Final Steps: Connecting GitHub to EC2 and Running the Application
- Set Up GitHub Secrets:
- Store sensitive information such as AWS credentials in GitHub Secrets.
- Connect GitHub to EC2:
- Use GitHub Actions to connect to your EC2 instance and deploy the application.
- Run the Application:
- Start the Flask application using Docker on your EC2 instance.
Conclusion
By following this guide, you will have successfully built and deployed a heart disease prediction model on AWS EC2 with a CI/CD pipeline. This project leverages various tools and technologies to provide a robust and scalable solution for heart disease prediction.
References
Building predictive disease models using Amazon SageMaker with Amazon HealthLake normalized data