Add dataset #1

fdsa654hg · 2020-11-22T15:47:46Z

add english to chinese dataset

ProFatXuanAll

Need to add a dataset class, see s2s.dset for examples.

ProFatXuanAll · 2020-11-26T19:19:58Z

s2s/dset/_eng2chi.py

+            df['tgt'].apply(str).apply(self.__class__.preprocess).to_list()
+        )
+
+    @staticmethod


For translation, one should use evaluation metrics other than exact match (i.e., generate words must "be the same word" and "in the same order" of target sequence , which is used by accuracy_score).
Why is exact match bad?
Consider the example translation pair I like apple and 我喜歡蘋果, 蘋果我喜歡 should also be an acceptable answer.
Thus, one should use some evaluation metrics with fozzy match, i.e., it's okay to not be that accurate but at the same time have same meaning (swapping order, synonym, etc.).
Nowadays people mostly use BLEU score as evaluation metric on translation task.
Go find some python package which calculate BLEU score for you.

ProFatXuanAll · 2020-11-26T19:22:30Z

s2s/dset/_eng2chi.py

+from s2s.dset._base import BaseDset
+from s2s.path import DATA_PATH
+
+class Eng2ChiDset(BaseDset):


Add docstring for Eng2ChiDset class, including reference of the source of the dataset.

ProFatXuanAll · 2020-11-28T05:47:03Z

Pipfile

 tqdm = "4.49.0"
 sklearn = "0.0"
 tensorboard = "2.3.0"
+nltk = "*"


Specify dependency version

ProFatXuanAll · 2020-11-28T05:48:13Z

run_eval_model.py

        batch_tgt=dset.all_tgt(),
        batch_pred=all_pred,
    ))
+    print(DSET_OPTS[args.dset_name].bleu_score(


This method only work on translation dataset but not arithmetic dataset.
Overload batch_eval can do the trick.

ProFatXuanAll · 2020-11-28T05:49:42Z

s2s/dset/_eng2chi.py

+from s2s.dset._base import BaseDset
+from s2s.path import DATA_PATH
+
+try:


User need to install dependency before running the code.
So you don't need to check whether a dependecies is missing or not.

Overload batch_eval function to count BLEU score Add docstring to _eng2chi.py

Accidentally changed the file run_train_tknzr.py

Add eng2chinese dataset

c699269

ProFatXuanAll requested changes Nov 23, 2020

View reviewed changes

Add eng2chi dataset class and modify other class for it

4700680

ProFatXuanAll requested changes Nov 26, 2020

View reviewed changes

test

e06ed77

ProFatXuanAll reviewed Nov 28, 2020

View reviewed changes

Pipfile Outdated

tqdm = "4.49.0"

sklearn = "0.0"

tensorboard = "2.3.0"

nltk = "*"

Copy link

Owner

ProFatXuanAll Nov 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specify dependency version

ProFatXuanAll reviewed Nov 28, 2020

View reviewed changes

fdsa654hg force-pushed the master branch from 4fad3ee to e06ed77 Compare November 29, 2020 12:30

fdsa654hg and others added 2 commits November 29, 2020 22:24

Finished eng2chi class

33d7cd5

Overload batch_eval function to count BLEU score Add docstring to _eng2chi.py

Fix the last commit error

faa3213

Accidentally changed the file run_train_tknzr.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add dataset #1

Add dataset #1

Uh oh!

fdsa654hg commented Nov 22, 2020

Uh oh!

ProFatXuanAll left a comment

Uh oh!

ProFatXuanAll Nov 26, 2020

Uh oh!

ProFatXuanAll Nov 26, 2020

Uh oh!

ProFatXuanAll Nov 28, 2020

Uh oh!

ProFatXuanAll Nov 28, 2020

Uh oh!

ProFatXuanAll Nov 28, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add dataset #1

Are you sure you want to change the base?

Add dataset #1

Uh oh!

Conversation

fdsa654hg commented Nov 22, 2020

Uh oh!

ProFatXuanAll left a comment

Choose a reason for hiding this comment

Uh oh!

ProFatXuanAll Nov 26, 2020

Choose a reason for hiding this comment

Uh oh!

ProFatXuanAll Nov 26, 2020

Choose a reason for hiding this comment

Uh oh!

ProFatXuanAll Nov 28, 2020

Choose a reason for hiding this comment

Uh oh!

ProFatXuanAll Nov 28, 2020

Choose a reason for hiding this comment

Uh oh!

ProFatXuanAll Nov 28, 2020

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants