Sourcery refactored master branch #15

sourcery-ai · 2022-07-18T09:12:19Z

Branch master refactored by Sourcery.

If you're happy with these changes, merge this Pull Request using the Squash and merge strategy.

See our documentation here.

Run Sourcery locally

Reduce the feedback loop during development by using the Sourcery editor plugin:

Review changes via command line

To manually merge these changes, make sure you're on the master branch, then run:

git fetch origin sourcery/master
git merge --ff-only FETCH_HEAD
git reset HEAD^

Help us improve this pull request!

sourcery-ai

Due to GitHub API limits, only the first 60 comments can be shown.

sourcery-ai · 2022-07-18T09:12:22Z

Part1_TF-IDF/example.py

    # 统计结果写入result.txt(字典的遍历)
    for (k, v) in num_dict.items():
-        open('data/result.txt', 'a+').write(str(k) + ' ' + str(v) + '\n')   # 将k，v转换为str类型
+        open('data/result.txt', 'a+').write(f'{str(k)} {str(v)}' + '\n')


Function com_tf refactored with the following changes:

Use f-string instead of string concatenation [×2] (use-fstring-for-concatenation)

This removes the following comments ( why? ):

# 将k，v转换为str类型

sourcery-ai · 2022-07-18T09:12:23Z

Part1_TF-IDF/src/GrobalParament.py

-path1 = path + 'data/title_and_abs/'
-newpath = path + "data/pro_keyword/"
+path1 = f'{path}data/title_and_abs/'
+newpath = f"{path}data/pro_keyword/"


Lines 20-21 refactored with the following changes:

Use f-string instead of string concatenation [×2] (use-fstring-for-concatenation)

sourcery-ai · 2022-07-18T09:12:23Z

Part1_TF-IDF/src/get_TF_IDF.py

-    data_source = open(file_import_url, 'r')
-    data = data_source.readline()
-    word_in_afile_stat = {}
-    word_in_allfiles_stat = {}
-    files_num = 0
-    while data != "":  # 对文件pro_res.txt进行处理
-        data_temp_1 = data.strip("\n").split("\t")  # file name and key words of a file
-        data_temp_2 = data_temp_1[1].split(",")  # key words of a file
-        file_name = data_temp_1[0]
-        data_temp_len = len(data_temp_2)
-        files_num += 1
-        data_dict = {}
-        data_dict.clear()
-        for word in data_temp_2:
-            if word not in word_in_allfiles_stat:
-                word_in_allfiles_stat[word] = 1
-                data_dict[word] = 1
-            else:
-                if word not in data_dict:  # 如果这个单词在这个文件中之前没有出现过
+    with open(file_import_url, 'r') as data_source:
+        data = data_source.readline()
+        word_in_afile_stat = {}
+        word_in_allfiles_stat = {}
+        files_num = 0
+        while data != "":  # 对文件pro_res.txt进行处理
+            data_temp_1 = data.strip("\n").split("\t")  # file name and key words of a file
+            data_temp_2 = data_temp_1[1].split(",")  # key words of a file
+            file_name = data_temp_1[0]
+            data_temp_len = len(data_temp_2)
+            files_num += 1
+            data_dict = {}
+            data_dict.clear()
+            for word in data_temp_2:
+                if word not in word_in_allfiles_stat:
+                    word_in_allfiles_stat[word] = 1
+                    data_dict[word] = 1
+                elif word not in data_dict:  # 如果这个单词在这个文件中之前没有出现过
                    word_in_allfiles_stat[word] += 1
                    data_dict[word] = 1

-            if not word_in_afile_stat.has_key(file_name):
-                word_in_afile_stat[file_name] = {}
-            if not word_in_afile_stat[file_name].has_key(word):
-                word_in_afile_stat[file_name][word] = []
-                word_in_afile_stat[file_name][word].append(data_temp_2.count(word))
-                word_in_afile_stat[file_name][word].append(data_temp_len)
-        data = data_source.readline()
-    data_source.close()
-
+                if not word_in_afile_stat.has_key(file_name):
+                    word_in_afile_stat[file_name] = {}
+                if not word_in_afile_stat[file_name].has_key(word):
+                    word_in_afile_stat[file_name][word] = [data_temp_2.count(word), data_temp_len]
+            data = data_source.readline()
    # filelist = os.listdir(newpath2)  # 取得当前路径下的所有文件
    TF_IDF_last_result = []
    if (word_in_afile_stat) and (word_in_allfiles_stat) and (files_num != 0):
-        for filename in word_in_afile_stat.keys():
+        for filename, value in word_in_afile_stat.items():
            TF_IDF_result = {}
            TF_IDF_result.clear()
-            for word in word_in_afile_stat[filename].keys():
+            for word in value.keys():


Function TF_IDF_Compute refactored with the following changes:

Use with when opening file to ensure closure [×2] (ensure-file-closed)

Merge else clause's nested if statement into elif (merge-else-if-into-elif)

Merge append into list declaration [×2] (merge-list-append)

Use items() to directly unpack dictionary values (use-dict-items)

Remove unnecessary call to keys() (remove-dict-keys)

Replace a[0:x] with a[:x] and a[x:len(a)] with a[x:] (remove-redundant-slice-index)

sourcery-ai · 2022-07-18T09:12:23Z

Part1_TF-IDF/src/get_data.py

-path = base_path + 'data/computer/'# 原始数据
-path1 = base_path + 'data/title_and_abs/'  # 处理后的标题和摘要
-newpath = base_path + 'data/pro_keyword/'
-newpath2 = base_path + 'data/keyword/'
+path = f'{base_path}data/computer/'
+path1 = f'{base_path}data/title_and_abs/'
+newpath = f'{base_path}data/pro_keyword/'
+newpath2 = f'{base_path}data/keyword/'


Lines 17-20 refactored with the following changes:

Use f-string instead of string concatenation [×4] (use-fstring-for-concatenation)

This removes the following comments ( why? ):

# 处理后的标题和摘要 # 原始数据

sourcery-ai · 2022-07-18T09:12:23Z

Part1_TF-IDF/src/get_data.py

-        # print b
        if b is None or b.string is None:
            continue
-        else:
-            abstracts.extend(soup.title.stripped_strings)
-            s = b.string
-            abstracts.extend(s.encode('utf-8'))
-            f = open(path1 + filename + ".txt", "w+")  # 写入txt文件
+        abstracts.extend(soup.title.stripped_strings)
+        s = b.string
+        abstracts.extend(s.encode('utf-8'))
+        with open(path1 + filename + ".txt", "w+") as f:
            for i in abstracts:
                f.write(i)
-            f.close()
-            abstracts = []
+        abstracts = []


Function get_text refactored with the following changes:

Remove unnecessary else after guard condition (remove-unnecessary-else)

Use with when opening file to ensure closure [×2] (ensure-file-closed)

This removes the following comments ( why? ):

# 写入txt文件 # 将得到的未处理的文字放在pro_keyword文件夹中 # print b

sourcery-ai · 2022-07-18T09:12:24Z

Part2_Text_Classify/feature.py

-    features = [text_len, isHasSH]
-    return features
+    return [text_len, isHasSH]


Function get_feature refactored with the following changes:

Inline variable that is immediately returned (inline-immediately-returned-variable)

sourcery-ai · 2022-07-18T09:12:25Z

Part2_Text_Classify/feature.py

-    print(X[0:10])
-    print(Y[0:10])
+    print(X[:10])
+    print(Y[:10])


Function load_data refactored with the following changes:

Replace a[0:x] with a[:x] and a[x:len(a)] with a[x:] [×2] (remove-redundant-slice-index)

sourcery-ai · 2022-07-18T09:12:25Z

Part2_Text_Classify/feature.py

-if __name__ == '__main__':
-    pass
+pass


Lines 48-49 refactored with the following changes:

Remove redundant conditional (remove-redundant-if)

sourcery-ai · 2022-07-18T09:12:25Z

Part2_Text_Classify/cnn-text-classification-tf-chinese/data_helpers.py

  data_size = len(data)
  num_batches_per_epoch = int(len(data)/batch_size) + 1
-  for epoch in range(num_epochs):
+  for _ in range(num_epochs):


Function batch_iter refactored with the following changes:

Replace unused for index with underscore (for-index-underscore)

sourcery-ai · 2022-07-18T09:12:25Z

Part2_Text_Classify/cnn-text-classification-tf-chinese/text_cnn.py

-    raise ValueError("Linear is expecting 2D arguments: %s" % str(shape))
+    raise ValueError(f"Linear is expecting 2D arguments: {str(shape)}")
  if not shape[1]:
-    raise ValueError("Linear expects shape[1] of arguments: %s" % str(shape))
+    raise ValueError(f"Linear expects shape[1] of arguments: {str(shape)}")


Function linear refactored with the following changes:

Replace interpolated string formatting with f-string [×2] (replace-interpolation-with-fstring)

sourcery-ai · 2022-07-18T09:12:29Z

Part2_Text_Classify/cnn-text-classification-tf-chinese/text_cnn.py

    pooled_outputs = []
    for filter_size, num_filter in zip(filter_sizes, num_filters):
-      with tf.name_scope("conv-maxpool-%s" % filter_size):
+      with tf.name_scope(f"conv-maxpool-{filter_size}"):


Function TextCNN.__init__ refactored with the following changes:

Replace interpolated string formatting with f-string (replace-interpolation-with-fstring)

sourcery-ai · 2022-07-18T09:12:29Z

Part2_Text_Classify/cnn-text-classification-tf-chinese/train.py

 print("\nParameters:")
 for attr, value in sorted(FLAGS.__flags.iteritems()):
-  print("{}={}".format(attr.upper(), value))
+  print(f"{attr.upper()}={value}")


Lines 36-46 refactored with the following changes:

Replace call to format with f-string (use-fstring-for-formatting)

sourcery-ai · 2022-07-18T09:12:29Z

Part2_Text_Classify/src/get_cls.py

-    f2 = open('%s.txt' % item, 'a+')
-    for (k, v) in data_dict.items():
-        f2.write(v + ',' + k + ' ' + '\n')
-    f2.close()
+    with open(f'{item}.txt', 'a+') as f2:
+        for (k, v) in data_dict.items():
+            f2.write(v + ',' + k + ' ' + '\n')


Function get_text refactored with the following changes:

Use with when opening file to ensure closure (ensure-file-closed)

Replace interpolated string formatting with f-string (replace-interpolation-with-fstring)

sourcery-ai · 2022-07-18T09:12:29Z

Part2_Text_Classify/src/get_cls.py

-        # print (files)
-        f = open(base_path + files, 'r')
-        text = (f.read().decode('GB2312', 'ignore').encode('utf-8'))
-        salt = ''.join(random.sample(string.ascii_letters + string.digits, 8))  # 产生随机数
-        f2 = open("C:/Users/kaifun/Desktop/ass_TIP/TextInfoExp/Part2_Text_Classify/test3/" + salt + '.txt', 'w')
-        f2.write(text)
-        f3.write(salt + ' ' + 'e' + '\n')
-        f.close()
+        with open(base_path + files, 'r') as f:
+            text = (f.read().decode('GB2312', 'ignore').encode('utf-8'))
+            salt = ''.join(random.sample(string.ascii_letters + string.digits, 8))  # 产生随机数
+            f2 = open(
+                f"C:/Users/kaifun/Desktop/ass_TIP/TextInfoExp/Part2_Text_Classify/test3/{salt}.txt",
+                'w',
+            )
+
+            f2.write(text)
+            f3.write(f'{salt} e' + '\n')


Function trans_text refactored with the following changes:

Use with when opening file to ensure closure (ensure-file-closed)

Use f-string instead of string concatenation [×4] (use-fstring-for-concatenation)

This removes the following comments ( why? ):

# print (files)

sourcery-ai · 2022-07-18T09:12:29Z

Part2_Text_Classify/src/get_cls.py

-        f.write(str(test_name[i]) + '   ' + str(result[i]) + '\n')
+        f.write(f'{str(test_name[i])}   {str(result[i])}' + '\n')


Function get_classify refactored with the following changes:

Use f-string instead of string concatenation [×2] (use-fstring-for-concatenation)

sourcery-ai · 2022-07-18T09:12:31Z