-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathReadMe.txt
More file actions
229 lines (185 loc) · 12.7 KB
/
ReadMe.txt
File metadata and controls
229 lines (185 loc) · 12.7 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
Author:
Israel Avihail
Required libraries:
######################################################################################################
import of file data_analysis
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////
- argparse
the object will hold all the information necessary to parse the command line into Python data types.
- on
os.path.exists Used to check if the path exists, if isn't exists we use os.mkdir to create.
- pickle
it serializes objects so they can be saved to a file, and loaded in a program again later on.
- Sequence
used for check if the object is type of Sequence.
- operator.itemgetter
return to us item all the time we use in genrator.
- numpy
numpy is a Python library used for working with arrays
- pandas
uesd to read csv files and do operations on it, Intervals used for us to check if object is the same type, pd.IntervalIndex use to convert to interval
- stats
We import scipy.stats to use the entropy function which represents the effective size index of probability space.
- sklearn
Kmeans, GaussianNB, CategoricalNB, KNeighborsClassifier, StandardScaler: used for calculation and generation of Confusion Matrix PDFs
////////////////////////////////////////////////////////////////////////////////////////////////////////////////// end data_analysis
import of file Classifier_Algorithm
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////
- abc
This module provides the infrastructure for defining abstract base classes (ABCs) in Python, we used to do interface
import of file dictionary_tree
- copy
import copy to use for deep copy
////////////////////////////////////////////////////////////////////////////////////////////////////////////////// end Classifier_Algorithm
import of file entropy
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////
- Fraction
used to convert two numbers to number rational and we used to check if object is the same type of fraction
- log2
used to calculate entropy
////////////////////////////////////////////////////////////////////////////////////////////////////////////////// end entropy
import of file entropy_discretization
-combinations
It return r-length tuples in sorted order with no repeated elements we use in entropy with genretor his return one tuples from the list
##################################################################################################################### End Required libraries
How to add custom classification algorithm:
#####################################################################################################################
step 1: add the algorithem to package/folder of "classification_algorithms" within the project
step 2: in the package/folder "classification_algorithms" within the "__init__.py" file:
2.1: import the new algorithm (class). example: "from classification_algorithms.algorithm_file import AlgorithmName"
2.2: add the algorithm (class) to "__algorithm__" list. example: "[AlgorithmName, <existing algorithms...>]"
*It is recommended that the classification algorithm will implement the classification algorithm interface "ClassifierAlgorithm" which is located in "project_util" package/folder.
example:
from project_util.classifier_algorithm import ClassifierAlgorithm
class AlgorithmName(ClassifierAlgorithm):
##################################################################################################################### end add custom classification models
Preparing and running within a Virtual Enviroment:
#####################################################################################################################
-- option 1 --
step 1: create a virtual enviroment (python must be installed before hand)
1.1. open a shell within the project folder ("ClassifyingModels")
1.1.1 on Windows run the command: py -m venv env
1.1.2 on Unix/MacOs run the command: python3 -m venv env
step 2: activate the enviroment
2.1 on Windows run the command: .\env\Scripts\activate
2.2 on Unix/MacOs run the command: source env/bin/activate
step 3: install requirements
3.1 on Windows run the command: py -m pip install -r requirements.txt
3.2 on Unix/MacOs run the command: python3 -m pip install -r requirements.txt
-- option 2 --
1. on Windows run the bat file within the project folder: "run_windows.bat"
2. on Unix/MacOs (or Windows with supporting shell such as git) run the sh file withing the project folder: "run_unix_mac.sh"
When the enviroment is all set, the program can now be run.
Important!!
After either options, when work with the program is finished, the enviroment needs to be deactivated:
In the active enviroment's open shell run the command: deactivate
##################################################################################################################### end enviroment preparation
How to Run :
#####################################################################################################################
Command help section - To see this text (the help section) via the program - run the command (in the open shell): classification_models.py -h
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////
usage: classification_models.py [-h] WORKING_MODE ... DATA_PATH
Description: Analyse data with data mining tools.
positional arguments:
DATA_PATH Path directory of dataset files.
example: C:/Users/user/Desktop/data
or
./resources
options:
-h, --help show this help message and exit
run modes:
run modes define which mode the system should run in the current execution.
example: classification_models.py all train.csv test.csv C:/Users/user/Desktop/data
or
classification_models.py preprocessing train.csv ./resources
WORKING_MODE run mode help
preprocessing (p, pp)
in this mode only preprocessing is applied
build_model (bm) in this mode only model build is is done
run_model (rm, r) in this mode the only operation done is running a model on test data
all (ALL, a, A) in this mode the whole program will be executed
Made by Israel Avihail.
For bugs & issues: bilbisli@gmail.com
////////////////////////////////////////////////////////////////////////////////////////////////////////////////// end command help section
Run All mode help section - To see this text (the help section) via the program - run the command (in the open shell): classification_models.py a -h
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////
usage: classification_models.py all [-h] [--fill FILL_BLANKS_TYPE] [--normalization] [--no-normalization]
[--discretization DISCRETIZATION_TYPE] [--bins BIN_NUMBER [BIN_NUMBER ...]]
[--algorithm ALGORITHM_TYPE] [--implementation IMPLEMENTATION_TYPE]
[--result_name PREDICTION_RESULT_FILE_NAME]
TRAINING_FILE_NAME TEST_FILE_NAME
positional arguments:
TRAINING_FILE_NAME Training dataset file name. example: train.csv
TEST_FILE_NAME Test dataset file name. example: test.csv
options:
-h, --help show this help message and exit
--fill FILL_BLANKS_TYPE
Fill blank cells parameter. example: --fill all
--normalization Apply normalization. example: --normalization
--no-normalization Do not apply normalization. example: --no-normalization
--discretization DISCRETIZATION_TYPE
Discretization type. example: --discretization equal_width
--bins BIN_NUMBER [BIN_NUMBER ...]
Number of bins (intervals) the continues data will be divided to. example: --bins=5
--algorithm ALGORITHM_TYPE
Model algorithm type. example: --algorithm algorithm_type
options: naive_bayes, decision_tree, k_neighbors, k_means
--implementation IMPLEMENTATION_TYPE
Apply built in/own implementations of classifying/discretization algorithms(if exists).
example: --implementation own
--result_name PREDICTION_RESULT_FILE_NAME
Prediction result file name to save. example: --result_name test_predicition_DecisionTree_1.csv
////////////////////////////////////////////////////////////////////////////////////////////////////////////////// end Run All Mode help section
Pre-Processing mode help section - To see this text (the help section) via the program - run the command (in the open shell): classification_models.py pp -h
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////
usage: classification_models.py preprocessing [-h] [--fill FILL_BLANKS_TYPE] [--normalization] [--no-normalization] [--discretization DISCRETIZATION_TYPE]
[--bins BIN_NUMBER [BIN_NUMBER ...]] [--implementation IMPLEMENTATION_TYPE] [--save_name FILE_NAME]
TRAINING_FILE_NAME
positional arguments:
TRAINING_FILE_NAME Training dataset file name. example: train.csv
options:
-h, --help show this help message and exit
--fill FILL_BLANKS_TYPE
Fill blank cells parameter. example: --fill all
--normalization Apply normalization. example: --normalization
--no-normalization Do not apply normalization. example: --no-normalization
--discretization DISCRETIZATION_TYPE
Discretization type. example: --discretization equal_width
--bins BIN_NUMBER [BIN_NUMBER ...]
Number of bins (intervals) the continues data will be divided to. example: --bins 5
--implementation IMPLEMENTATION_TYPE
Apply built in/own implementations of classifying/discretization algorithms(if exists).
example: --implementation own
--save_name FILE_NAME
The name of the file to be saved after processing. example: name.csv
////////////////////////////////////////////////////////////////////////////////////////////////////////////////// end preprocessing help section
Build Model mode help section - To see this text (the help section) via the program - run the command (in the open shell): classification_models.py bm -h
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////
usage: classification_models.py build_model [-h] [--algorithm ALGORITHM_TYPE] [--implementation IMPLEMENTATION_TYPE]
[--model_name MODEL_NAME]
POST_PREPROCESSED_FILE_NAME
positional arguments:
POST_PREPROCESSED_FILE_NAME
Training dataset file name (already undergone preprocessing). example: train_clean.csv
options:
-h, --help show this help message and exit
--algorithm ALGORITHM_TYPE
Model algorithm type. example: --algorithm algorithm_type.
options: naive_bayes, decision_tree, k_neighbors, k_means
--implementation IMPLEMENTATION_TYPE
Model algorithm type. example: --implementaion built_in
--model_name MODEL_NAME
The name of the model to be saved (as pickle). example: --model_name decision_tree_model_1
////////////////////////////////////////////////////////////////////////////////////////////////////////////////// end Build Model help section
Run Model mode help section - To see this text (the help section) via the program - run the command (in the open shell): classification_models.py rm -h
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////
positional arguments:
TEST_FILE_NAME Test dataset file name. example: test.csv
options:
-h, --help show this help message and exit
--model_name TEST_FILE_NAME
Model file name that is already saved (as pickle). example: --model_name decision_tree_model_1
--result_name PREDICTION_RESULT_FILE_NAME
Prediction result file name to save. example: --result_name test_predicition_DecisionTree_1.csv
////////////////////////////////////////////////////////////////////////////////////////////////////////////////// end Run Model mode help section
##################################################################################################################### end How to Run