- Installation
- Project Motivation
- Project Components
- Instructions
- Licensing, Authors, and Acknowledgements
The main libraries needed to run the code are the following:
- Python 3.6.7 |Anaconda, Inc.|
- Scikit-Learn for Machine Learning algorithms
- NumPy for numerical vectorize calculations
- Pandas for data manipulation
- NLTK for Natural Language Processing
- sqlalchemy for interaction within databases
For this project, I designed an ETL pipeline, machine learning pipeline that uses Natural Language Processing (NLP) to analyze disaster data from FigureEight to build a model for an API that classifies disaster messages.
The project includes a web app where an emergency worker can input a new message and get classification results in several categories. The web app will also display visualizations of the data.
There are three main components for this project:
1. ETL Pipeline
The Python script, process_data.py, contains a data cleaning pipeline that:
- Loads the messages and categories datasets
- Merges the two datasets
- Cleans the data
- Stores it in a SQLite database
2. ML Pipeline
In the Python script, train_classifier.py, there is a machine learning pipeline that:
- Loads data from the SQLite database
- Splits the dataset into training and test sets
- Builds a text processing and machine learning pipeline
- Trains and tunes a model using GridSearchCV
- Outputs results on the test set
- Exports the final model as a pickle file
3. Flask Web App
Web App that shows data visualizations and allows a person to use the trained model to classify disaster messages.
- Run the following commands in the project's root directory to set up the database and model.
-
To run ETL pipeline that cleans data and stores in database
python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/DisasterResponse.db -
To run ML pipeline that trains classifier and saves
python models/train_classifier.py data/DisasterResponse.db models/classifier.pkl
-
Run the following command in the app's directory to run the web app.
python run.py -
Go to http://0.0.0.0:3001/ (Go to your web app link)
Must give credit to FigureEight for the data and project idea. Author: Gustavo Cedeno following recommendations and requirements from Udacity's Data Science ND Program.