Skip to content

satya-root/Text-Classification-Python

 
 

Repository files navigation

Text-Classification-Python

This repository covers whole range of text classification problems using different machine learning algorithms.

1. Installation & Requirements:

The general installation guide to run different projects is provided here. However if any error occurs due to missing libraries, please read the error and install the library according to that information.

Clone the tool if you have git installed.

Then run these command in the Command Prompt or Terminal.
git clone https://github.com/Yunus0or1/Text-Classification-Python.git
cd Text-Classification-Python

OR

Download from the link: https://github.com/Yunus0or1/Text-Classification-Python/archive/master.zip

Then, run these command in the Command Prompt or Terminal.

cd Text-Classification-Python
pip install -r requirements.txt

There are issues regarding the installation of Tensorflow. To check versioning and other aspects, please click this link to make a clear understanding of tensoflow installation guide.

Run

These are all Python files.

Install Python3 or Python2.7 
Open CMD
Go to directory path and write below command
python3 <filename.py>

2. Usage:

i. Run the python file from the directories.

Or type the command in terminal/command prompt:

python _filename_

Source code explanations


There is a urge necessity to use Embedding Layer in neural network to do text classification. To understand why, hit this Medium article.

  • Convolutional Neural Network in action to do text classification.

  • Layers: Embedding→Conv1D→MaxPooling1D→Conv1D→MaxPooling1D→LSTM→Dense

  • softmax activatation is used to do a normalized probability distribution among multiple classes.

NER-Python

  • An NER system using preposition to extract location from social media posts.
  • Uses NLTK library to get the Parts of Speech tags and identify place names on three steps.
  • All the POS tags along with a video tutorial can be found in this link.
  • The program analyses the given String and look up different prepositions in order to find a valid location name

Neural_Network_Classification

  • neural_network_conv.py contains source code on Convolutional Neural Network in action to do text classification. Layers: Embedding → Conv1D → MaxPooling1D → Conv1D → MaxPooling1D → LSTM → Dense.

  • neural_network_dense.py contains source code on a very simple neural network in action to do text classification. Layers: Dense 256 neurons → Dense 10 neurons. Very fast to do text classification. No batch.

  • neural_network_lstm.py contains source code on a LSTM neural network in action to do text classification. Layers: Embedding → Dense → LSTM → Dense.

  • softmax activatation is used in the last layer to do a normalized probability distribution among multiple classes.

  • Data Labels

    1- Traffic Jam
    2- No Traffic Jam
    3- Road Condition
    6- Accident
    7- Fire
    

Neural_Translator

  • A neural network to translate phonetic Bangla to Bangla.
  • For a simple neural tranlator the layer is: GRU → TimeDistributed → Dropout → TimeDistributed .
  • For a complex neural tranlator the layer is: Bidirectional → TimeDistributed → Dropout → TimeDistributed .
  • No hot encoding.
  • However achieved very poor performance due to lack of translation data. Only 400 data are available.

Road-Condition-Analysis

  • This is research based project. The research paper is submitted to IEEE ICCIT 2020.
  • The research is based on road condition analyses of Dhaka city from social media posts.
  • machine_classification.py contains source code of road condition anaylsis using different machine learning algorthims such as MultinomialNB, LogisticRegression, KNeighborsClassifier
  • nueraul_classification.py contains source code of road condition anaylsis using neural network. This procedure is similar to Neural_Network_classification problems.

Wrong_Word_Correction

  • This is research based project that has been published. Hit this Journal to get details on this project.

  • wg.py contains source code that generates about 80 wrong words from one single defined correct word.

  • ml.py contains source code that classifies wrong words using different machine learning algorthims such as MultinomialNB, LogisticRegression, KNeighborsClassifier, RandomForestClassifier etc.

  • To be noted, when running the ml.py program, it prompts for choices. Theses are the meaning.

    WBT  = Word Based Tokenization
    CBT  = Character Based Tokenization
    ACB = Advance Character Based Tokenization
    
    NON Saved Model processing = Ground up training, evaluation and predict new wrong word
    Saved Model processing = Loading pre trained model weights and predict new wrong word
    

About

This repository covers whole range of text classification problems using different machine learning algorithms.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%