Conversation
Change filtering criteria: DO NOT filter out pathogenic germline variants. TERT always to report. The genes with one of these 2 conditions will be reported with highest priority in PRONTO report. These "rescued" variants should have a separate column (at the far right of the table) to verify this. Separate column "Filter rescued", values are "Yes" or empty.
… should not appear in the tables appearing on the right of slides in report, but only printed this column in the summary table in slide 8.
|
@marrip I just got further request for this issue. I updated in the issue 90. There will be some further new codes coming soon. |
ok, then I will wait with the review until you tell me to start |
… rescued variants which are not include in Filter0-3.(Last table in the report)
…to develop_issue90
|
will start latest tomorrow 🙂 |
marrip
left a comment
There was a problem hiding this comment.
hey Xiaoli! I had a couple of questions and a suggestion. I am also working on a refactoring of some of the parts but need your input first
| table8_column_width = [0.54, 0.96, 0.96, 0.51, 0.73, 1.12, 2.26, 0.79, 0.81, 0.53, 0.53] | ||
| table_max_rows_per_slide = int(cfg.get("INPUT", "table_max_rows_per_slide")) | ||
| insert_table_to_ppt(slide8_table_data_file,slide8_table_ppSlide,slide8_table_name,slide8_header_left,slide8_header_top,slide8_header_width,slide8_table_left,slide8_table_top,slide8_table_width,slide8_table_height,slide8_table_font_size,slide8_table_header,output_ppt_file,if_print_rowNo,table8_column_width,table_max_rows_per_slide) | ||
| slide8_table_nrows = insert_table_to_ppt(slide8_table_data_file,slide8_table_ppSlide,slide8_table_name,slide8_header_left,slide8_header_top,slide8_header_width,slide8_table_left,slide8_table_top,slide8_table_width,slide8_table_height,slide8_table_font_size,slide8_table_header,output_ppt_file,if_print_rowNo,table8_column_width,table_max_rows_per_slide) |
There was a problem hiding this comment.
looks like this variable is not used, we can probably just omit it:
| slide8_table_nrows = insert_table_to_ppt(slide8_table_data_file,slide8_table_ppSlide,slide8_table_name,slide8_header_left,slide8_header_top,slide8_header_width,slide8_table_left,slide8_table_top,slide8_table_width,slide8_table_height,slide8_table_font_size,slide8_table_header,output_ppt_file,if_print_rowNo,table8_column_width,table_max_rows_per_slide) | |
| _ = insert_table_to_ppt(slide8_table_data_file,slide8_table_ppSlide,slide8_table_name,slide8_header_left,slide8_header_top,slide8_header_width,slide8_table_left,slide8_table_top,slide8_table_width,slide8_table_height,slide8_table_font_size,slide8_table_header,output_ppt_file,if_print_rowNo,table8_column_width,table_max_rows_per_slide) |
| output_table_file_config = output_file_preMTB_table_path + "_" + output_table + ".txt" | ||
| if(',' in filter_column): | ||
| for column in filter_column.split(','): | ||
| all_data = read_tsv(data_file_small_variant_table,column,key_word) |
There was a problem hiding this comment.
here it looks like you are always overwriting all_data by using the last item in filter_column in read_tsv. Is that desired behavior?
| if(filter_section == "0"): | ||
| all_data_filter = [] | ||
| top_filter = int(cfg.get("INPUT", "top_filter")) + 1 | ||
| for top_filter_num in range(1,top_filter): |
There was a problem hiding this comment.
I would like to refactor this section to make it easier to read
| clear_blank_line(output_table_file_config_pre,output_table_file_config) | ||
| all_data_filter.append(all_data) | ||
|
|
||
| all_data_filter = sum(all_data_filter, []) |
| if(len(all_data_filter[i]) < header_length): | ||
| count = header_length - len(all_data_filter[i]) | ||
| all_data_filter[i] = [[item.replace('\n', '') for item in cell] for cell in all_data_filter[i]] | ||
| all_data_filter[i].pop() | ||
| for j in range(1, count): | ||
| all_data_filter[i].append(' \t') | ||
| all_data_filter[i].append('\n') |
There was a problem hiding this comment.
could you explain to me what this section does? It replaces any \n, removes the last item and places empty fields in the table and finishes off with \n. Why is this necessary?
|
Looking at the remaining changes it seems that a lot of the fixes are to handle different column numbers of the combined tables, replacing tabs with linebreaks or vice versa and making data unique. I would suggest we rework this and use pandas instead which would make reading, filtering, combining and writing to file a lot easier. What do you think? |
Change filtering criteria:
The gene with one of these 2 conditions will be reported with highest priority in PRONTO report.
These "rescued" variants should have a separate column in tables to verify this. Separate column "Filter rescued", values are "Yes" or empty.