Skip to content

Updates for issue 90#105

Open
xiaoliz0 wants to merge 7 commits intomainfrom
develop_issue90
Open

Updates for issue 90#105
xiaoliz0 wants to merge 7 commits intomainfrom
develop_issue90

Conversation

@xiaoliz0
Copy link
Copy Markdown
Contributor

Change filtering criteria:

  1. DO NOT filter out pathogenic germline variants.
  2. TERT always to report.

The gene with one of these 2 conditions will be reported with highest priority in PRONTO report.
These "rescued" variants should have a separate column in tables to verify this. Separate column "Filter rescued", values are "Yes" or empty.

Change filtering criteria: DO NOT filter out pathogenic germline variants. TERT always to report.
The genes with one of these 2 conditions will be reported with highest priority in PRONTO report.
These "rescued" variants should have a separate column (at the far right of the table) to verify this. Separate column "Filter rescued", values are "Yes" or empty.
@xiaoliz0 xiaoliz0 linked an issue Apr 22, 2026 that may be closed by this pull request
… should not appear in the tables appearing on the right of slides in report, but only printed this column in the summary table in slide 8.
@xiaoliz0 xiaoliz0 requested review from marrip and tonjegul April 30, 2026 11:09
@xiaoliz0
Copy link
Copy Markdown
Contributor Author

@marrip I just got further request for this issue. I updated in the issue 90. There will be some further new codes coming soon.

@marrip
Copy link
Copy Markdown
Collaborator

marrip commented Apr 30, 2026

@marrip I just got further request for this issue. I updated in the issue 90. There will be some further new codes coming soon.

ok, then I will wait with the review until you tell me to start ☺️

@xiaoliz0
Copy link
Copy Markdown
Contributor Author

xiaoliz0 commented May 4, 2026

@marrip I just got further request for this issue. I updated in the issue 90. There will be some further new codes coming soon.

ok, then I will wait with the review until you tell me to start ☺️

The new commits implement the further request. Feel free to review the codes. @marrip :)

@marrip
Copy link
Copy Markdown
Collaborator

marrip commented May 4, 2026

will start latest tomorrow 🙂

Copy link
Copy Markdown
Collaborator

@marrip marrip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey Xiaoli! I had a couple of questions and a suggestion. I am also working on a refactoring of some of the parts but need your input first ☺️ Will continue tomorrow.

Comment thread Script/PRONTO.py
table8_column_width = [0.54, 0.96, 0.96, 0.51, 0.73, 1.12, 2.26, 0.79, 0.81, 0.53, 0.53]
table_max_rows_per_slide = int(cfg.get("INPUT", "table_max_rows_per_slide"))
insert_table_to_ppt(slide8_table_data_file,slide8_table_ppSlide,slide8_table_name,slide8_header_left,slide8_header_top,slide8_header_width,slide8_table_left,slide8_table_top,slide8_table_width,slide8_table_height,slide8_table_font_size,slide8_table_header,output_ppt_file,if_print_rowNo,table8_column_width,table_max_rows_per_slide)
slide8_table_nrows = insert_table_to_ppt(slide8_table_data_file,slide8_table_ppSlide,slide8_table_name,slide8_header_left,slide8_header_top,slide8_header_width,slide8_table_left,slide8_table_top,slide8_table_width,slide8_table_height,slide8_table_font_size,slide8_table_header,output_ppt_file,if_print_rowNo,table8_column_width,table_max_rows_per_slide)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like this variable is not used, we can probably just omit it:

Suggested change
slide8_table_nrows = insert_table_to_ppt(slide8_table_data_file,slide8_table_ppSlide,slide8_table_name,slide8_header_left,slide8_header_top,slide8_header_width,slide8_table_left,slide8_table_top,slide8_table_width,slide8_table_height,slide8_table_font_size,slide8_table_header,output_ppt_file,if_print_rowNo,table8_column_width,table_max_rows_per_slide)
_ = insert_table_to_ppt(slide8_table_data_file,slide8_table_ppSlide,slide8_table_name,slide8_header_left,slide8_header_top,slide8_header_width,slide8_table_left,slide8_table_top,slide8_table_width,slide8_table_height,slide8_table_font_size,slide8_table_header,output_ppt_file,if_print_rowNo,table8_column_width,table_max_rows_per_slide)

Comment thread Script/PRONTO.py
output_table_file_config = output_file_preMTB_table_path + "_" + output_table + ".txt"
if(',' in filter_column):
for column in filter_column.split(','):
all_data = read_tsv(data_file_small_variant_table,column,key_word)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here it looks like you are always overwriting all_data by using the last item in filter_column in read_tsv. Is that desired behavior?

Comment thread Script/PRONTO.py
if(filter_section == "0"):
all_data_filter = []
top_filter = int(cfg.get("INPUT", "top_filter")) + 1
for top_filter_num in range(1,top_filter):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to refactor this section to make it easier to read ☺️

Comment thread Script/PRONTO.py
clear_blank_line(output_table_file_config_pre,output_table_file_config)
all_data_filter.append(all_data)

all_data_filter = sum(all_data_filter, [])
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this do?

Comment thread Script/PRONTO.py
Comment on lines +1412 to +1418
if(len(all_data_filter[i]) < header_length):
count = header_length - len(all_data_filter[i])
all_data_filter[i] = [[item.replace('\n', '') for item in cell] for cell in all_data_filter[i]]
all_data_filter[i].pop()
for j in range(1, count):
all_data_filter[i].append(' \t')
all_data_filter[i].append('\n')
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you explain to me what this section does? It replaces any \n, removes the last item and places empty fields in the table and finishes off with \n. Why is this necessary?

@marrip
Copy link
Copy Markdown
Collaborator

marrip commented May 6, 2026

Looking at the remaining changes it seems that a lot of the fixes are to handle different column numbers of the combined tables, replacing tabs with linebreaks or vice versa and making data unique. I would suggest we rework this and use pandas instead which would make reading, filtering, combining and writing to file a lot easier. What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

National request for data filter (big change)

2 participants