Conversation
…er consistent with dataset+define+library bilder.
|
@DmitryMK there is a merge conflict--can you resolve it so I can review. |
RamilCDISC
left a comment
There was a problem hiding this comment.
Could you please resolve the conflict on the branch too?
| right_on="define_variable_name", | ||
| how="left", | ||
| how="outer", | ||
| ).fillna("") |
There was a problem hiding this comment.
If we make the Nan's a empty string "" we lose information if the columns was missing or it is just empty. I think better would be to handle this in a different way. Please let me know if I am not understanding the real intent here.
There was a problem hiding this comment.
Hi Ramil, I did it for consistency with metadata+define+library dataset builder, as it had fillna("") in it.
What is the difference between missing and empty column? In my understanding it either comes from dataset(metadata) or define. So the missing values logically appear only when it is present in define only. I might be wrong here.
|
@SFJohnson24 @RamilCDISC I have merged the main branch and updated the tests accordingly. |
@DmitryMK this issue is resolved now |
RamilCDISC
left a comment
There was a problem hiding this comment.
For manual validation I ran the rule in the connected ticket description against negative dataset in the Sharepoint. The builder had an error. The trace is below.
python core.py validate -s sdtmig -v 3.4 -lr rule.yml -dp "/Users/ramilabbasov/Library/CloudStorage/OneDrive-CDISC/Documents - CORE Rules/unitTesting/SDTMIG/CG0015/negative/01/data/unit-test-coreid-CG0015-negative1.xlsx" -dxp "/Users/ramilabbasov/Library/CloudStorage/OneDrive-CDISC/Documents - CORE Rules/unitTesting/SDTMIG/CG0015/negative/01/data/define_CG0015-negative.xml"
Traceback (most recent call last):
File "/Users/ramilabbasov/Desktop/ISSUES/venv/lib/python3.12/site-packages/pandas/io/parsers/base_parser.py", line 838, in _cast_types
values = astype_array(values, cast_type, copy=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/ramilabbasov/Desktop/ISSUES/venv/lib/python3.12/site-packages/pandas/core/dtypes/astype.py", line 183, in astype_array
values = _astype_nansafe(values, dtype, copy=copy)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/ramilabbasov/Desktop/ISSUES/venv/lib/python3.12/site-packages/pandas/core/dtypes/astype.py", line 134, in _astype_nansafe
return arr.astype(dtype, copy=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: could not convert string to float: '2021-04-27'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/ramilabbasov/Desktop/ISSUES/ISS-1655/cdisc-rules-engine/core.py", line 1094, in <module>
cli()
File "/Users/ramilabbasov/Desktop/ISSUES/venv/lib/python3.12/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/ramilabbasov/Desktop/ISSUES/venv/lib/python3.12/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/Users/ramilabbasov/Desktop/ISSUES/venv/lib/python3.12/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/ramilabbasov/Desktop/ISSUES/venv/lib/python3.12/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/ramilabbasov/Desktop/ISSUES/venv/lib/python3.12/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/ramilabbasov/Desktop/ISSUES/venv/lib/python3.12/site-packages/click/decorators.py", line 33, in new_func
return f(get_current_context(), *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/ramilabbasov/Desktop/ISSUES/ISS-1655/cdisc-rules-engine/core.py", line 669, in validate
run_validation(
File "/Users/ramilabbasov/Desktop/ISSUES/ISS-1655/cdisc-rules-engine/scripts/run_validation.py", line 180, in run_validation
).get_data_service(args.dataset_paths)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/ramilabbasov/Desktop/ISSUES/ISS-1655/cdisc-rules-engine/cdisc_rules_engine/services/data_services/data_service_factory.py", line 89, in get_data_service
return self.get_service(
^^^^^^^^^^^^^^^^^
File "/Users/ramilabbasov/Desktop/ISSUES/ISS-1655/cdisc-rules-engine/cdisc_rules_engine/services/data_services/data_service_factory.py", line 147, in get_service
return self._registered_services_map.get(service_name).get_instance(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/ramilabbasov/Desktop/ISSUES/ISS-1655/cdisc-rules-engine/cdisc_rules_engine/services/data_services/excel_data_service.py", line 51, in get_instance
service = cls(
^^^^
File "/Users/ramilabbasov/Desktop/ISSUES/ISS-1655/cdisc-rules-engine/cdisc_rules_engine/services/data_services/excel_data_service.py", line 39, in __init__
super(ExcelDataService, self).__init__(
File "/Users/ramilabbasov/Desktop/ISSUES/ISS-1655/cdisc-rules-engine/cdisc_rules_engine/services/data_services/base_data_service.py", line 120, in __init__
self._initialize_datasets_metadata(**kwargs)
File "/Users/ramilabbasov/Desktop/ISSUES/ISS-1655/cdisc-rules-engine/cdisc_rules_engine/services/data_services/excel_data_service.py", line 178, in _initialize_datasets_metadata
dataset = self.__get_dataset(dataset_filename)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/ramilabbasov/Desktop/ISSUES/ISS-1655/cdisc-rules-engine/cdisc_rules_engine/services/data_services/excel_data_service.py", line 96, in __get_dataset
dataframe = pd.read_excel(
^^^^^^^^^^^^^^
File "/Users/ramilabbasov/Desktop/ISSUES/venv/lib/python3.12/site-packages/pandas/io/excel/_base.py", line 517, in read_excel
data = io.parse(
^^^^^^^^^
File "/Users/ramilabbasov/Desktop/ISSUES/venv/lib/python3.12/site-packages/pandas/io/excel/_base.py", line 1629, in parse
return self._reader.parse(
^^^^^^^^^^^^^^^^^^^
File "/Users/ramilabbasov/Desktop/ISSUES/venv/lib/python3.12/site-packages/pandas/io/excel/_base.py", line 931, in parse
raise err
File "/Users/ramilabbasov/Desktop/ISSUES/venv/lib/python3.12/site-packages/pandas/io/excel/_base.py", line 918, in parse
output[asheetname] = parser.read(nrows=nrows)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/ramilabbasov/Desktop/ISSUES/venv/lib/python3.12/site-packages/pandas/io/parsers/readers.py", line 1748, in read
) = self._engine.read( # type: ignore[attr-defined]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/ramilabbasov/Desktop/ISSUES/venv/lib/python3.12/site-packages/pandas/io/parsers/python_parser.py", line 289, in read
conv_data = self._convert_data(data)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/ramilabbasov/Desktop/ISSUES/venv/lib/python3.12/site-packages/pandas/io/parsers/python_parser.py", line 360, in _convert_data
return self._convert_to_ndarrays(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/ramilabbasov/Desktop/ISSUES/venv/lib/python3.12/site-packages/pandas/io/parsers/base_parser.py", line 593, in _convert_to_ndarrays
cvals = self._cast_types(cvals, cast_type, c)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/ramilabbasov/Desktop/ISSUES/venv/lib/python3.12/site-packages/pandas/io/parsers/base_parser.py", line 840, in _cast_types
raise ValueError(
ValueError: Unable to convert column SVENDTC to type float64 (sheet: sv.xpt)
((venv) ) ramilabbasov@Ramils-MacBook-Pro cdisc-rules-engine %
Addressing #1655
Updated 2 builders to keep define.xml information:
I have changed VariablesMetadataWithDefineDatasetBuilder to be consistent with the VariablesMetadataWithDefineAndLibraryDatasetBuilder and added a test for this builder:
@gerrycampion @SFJohnson24 I have 2 questions: