Very easy to run. First go to PROJECT_PATH\src\Author Recognition and run the following command
java -jar Natural-Language-Processing-Project-1.jar
Program will ask the directory in which authors and their files are located.
Expected directory structure is the following:
input_path/authorName/fileX
This is the second project of the course Natural Language Processing in Bogazici University during the semester Spring 2016 Implementation of Viterbi Algorithm on the dataset Metubank in java
I made an runner class which receives all the parameters and runs the 3 task one by one. Nothing has changed internally, this is just to simplify the life.
You can find the jar file in Natural-Language-Processing-Project-1\src\POS Tagger,
Receives several parameters, you can compile it as well but I extracted the Jar file in the project which can be run easily with the following command:
java -jar RunThemAll.jar trainingFilePath [postag/cpostag] testFilePath outputPath goldStandardPath
Assuming all files are compiled using:
javac <filename>.java
This program expects your to send two parameters as specified in the project description. First one is the path of the training data Second one is the option of pos tags Either postag|cpostag This program will train itself using the training data and will create 3 output files in the same directory that are provided below
posNamesToPosNamesProbablities.ser
posNamesToWordPossibilities.ser
posType.ser
If you dont provide any parameters then the program will crash.
This program expects your to send two parameters as specified in the project description. First one is the path of the test data Second one is the path of the output file This program will test the test data using the pre-produced model by reading the .ser files explained above, afterwards it will produce an output file in which the guesses are written in a readable manner.
This program expects your to send two parameters as specified in the project description. First one is the path of output file that was described in the Test.java. Second one is the path of the gold_standard file. This program will output the confusion matrix in a json form and an example is the following
Noun : {
Noun : 7
Adj : 2
},
Punc : {
Punc: 15
}
The example states in the test data there were 9 words labelled as noun and the program guessed 7 of them as noun and 2 of them as adjective. Besides, there were 15 Puncs and all of them were correctly labelled by the program
For Turkish Speaking Developers: Projeden bagimsiz sekilde Turkiye illeri arasindaki mesafe bir json dosyasi olarak projede bulunuyor.