Skip to content

gcedeno/NLP_DisasterResponse

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Table of Contents

  1. Installation
  2. Project Motivation
  3. Project Components
  4. Instructions
  5. Licensing, Authors, and Acknowledgements

Installation

The main libraries needed to run the code are the following:

  • Python 3.6.7 |Anaconda, Inc.|
  • Scikit-Learn for Machine Learning algorithms
  • NumPy for numerical vectorize calculations
  • Pandas for data manipulation
  • NLTK for Natural Language Processing
  • sqlalchemy for interaction within databases

Project Motivation

For this project, I designed an ETL pipeline, machine learning pipeline that uses Natural Language Processing (NLP) to analyze disaster data from FigureEight to build a model for an API that classifies disaster messages.

The project includes a web app where an emergency worker can input a new message and get classification results in several categories. The web app will also display visualizations of the data.

Project Components

There are three main components for this project:

1. ETL Pipeline

The Python script, process_data.py, contains a data cleaning pipeline that:

  • Loads the messages and categories datasets
  • Merges the two datasets
  • Cleans the data
  • Stores it in a SQLite database

2. ML Pipeline

In the Python script, train_classifier.py, there is a machine learning pipeline that:

  • Loads data from the SQLite database
  • Splits the dataset into training and test sets
  • Builds a text processing and machine learning pipeline
  • Trains and tunes a model using GridSearchCV
  • Outputs results on the test set
  • Exports the final model as a pickle file

3. Flask Web App

Web App that shows data visualizations and allows a person to use the trained model to classify disaster messages.

Instructions

  1. Run the following commands in the project's root directory to set up the database and model.
  • To run ETL pipeline that cleans data and stores in database

    python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/DisasterResponse.db

  • To run ML pipeline that trains classifier and saves

    python models/train_classifier.py data/DisasterResponse.db models/classifier.pkl

  1. Run the following command in the app's directory to run the web app.

    python run.py

  2. Go to http://0.0.0.0:3001/ (Go to your web app link)

Licensing, Authors, Acknowledgements

Must give credit to FigureEight for the data and project idea. Author: Gustavo Cedeno following recommendations and requirements from Udacity's Data Science ND Program.

NLP_DisasterResponse

About

Design of an ETL pipeline, machine learning pipeline and use of Natural Language Processing (NLP) to analyze disaster data to build a model for an API that classifies disaster messages.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors