Skip to content

mprashant24/codekata14

Repository files navigation

To open the project in eclipse you can create it using New --> Project... --> Java --> Java Project form existing build file. you can compile the code using --> ant clean build Program takes book file name as command line argument, you can run it using --> ant run. it will generate new file in the resource folder.

Implementation details :-

This project is implementation of CodeKata 14 article from http://codekata.com/kata/kata14-tom-swift-under-the-milkwood/ The objective of CodeKata 14 is to generate trigrams from the book and then re generate a different version of the book using the trigrams. I decided to implement it by dividing the objective in two sub task.

  1. Reading the book and generating the trigrams.
  2. Reading the book and re-write the book using trigrams by matching the sentence length and paragraph length from original book.

CodeKata14.java is the entry point class for this solution and have 3 methods.

  1. main --> entry point method.
  2. buildTrigramsFromBooks --> perform above sub task 1, using BookReader and TrigramGenerator classes it generates TrigramMap as result. I used producer-consumer pattern to read the line from book and generate the trigrams from it. Here BookReader is producer of lines, and TrigramsGenerator is consumer of lines to generate the trigrams.
  3. rewriteBookFromTrigram --> perform above sub task 2, using BookReader, BookWriter and TrigramMap generated by above method.

BookReader class is responsible for reading the book from file line by line and feeding that to a Blocking Queue, which is used by TrigramGenerator. All the generated trigrams are stored in TrigramsMap class.

TrigramsMap is a wrapper around the HashMap. TrigramMap class stores String key against ArrayList of values. It also overrides the behavior of get method to keep the counter of returned element and return the elements in cyclic order from 0 to size of values. It also have method to get the random key and size from the trigrams store.

TrigramsGenerator class implements Callable interface and call method returns the generated TrigranmMap from Future object. I have used regular expression "([^$]|[^\\s])\s+(([^\\s])\s+([^\\s])(\s.)?", this regular expression selects first 3 words and also string after the first word to the end of string. this way I am able to generate trigrams easily. Once I reach string not matching the regex pattern then I append next line after the remaining string and try again. I follow this process till the last line of the book.

By the time BookReader and TrigramsGenerator working CodeKata14 is waiting for them to complete using CountDownLatch. TrigramsGenerator runs until BookReader finished and all lines from BlockingQueue is processed.

Once book reading and trigrams generation finished CodeKata14 start the book re-writing process. BookWriter class also implements the Callable interface and return the line count processed. I have re used the book reader to read book to match the length of sentence and paragraphs. BookWriter starts by getting random key from TrigramsMap generated in step 1. and use it as start of the book, then it gets the value for that key and append it to the line. In case there is no matching key for the last 2 words, it again fetches a random key from TrigramsMap and uses the value of that key as next word. BookWriter follow this process till it BookReader finished reading book and all lines from BlockingQueue is processed.

There are few additional things should be done before reaching the production level code. Logging and exception handling. As of now there is no logging and exceptions just thrown without handling it. I was not able to complete it due to time crunch.

About

This is solution for the article http://codekata.com/kata/kata14-tom-swift-under-the-milkwood/

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages