Skip to content

Conversation

@ged-steponavicius
Copy link

Description:

fixes issue #259

@lr4d
Copy link
Collaborator

lr4d commented Mar 19, 2020

The black code formatter is causing tests to be red. I suggests you install the pre-commit hooks or run black on the code files

Copy link
Collaborator

@fjetter fjetter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution!

Overall LGTM. Can you add an entry to the changelog, please? see https://github.com/JDASoftwareGroup/kartothek/blob/master/CHANGES.rst

@codecov
Copy link

codecov bot commented Mar 22, 2020

Codecov Report

Merging #260 into master will increase coverage by 0.11%.
The diff coverage is 94.73%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #260      +/-   ##
==========================================
+ Coverage   89.72%   89.84%   +0.11%     
==========================================
  Files          39       39              
  Lines        3729     3741      +12     
  Branches      906      909       +3     
==========================================
+ Hits         3346     3361      +15     
+ Misses        224      223       -1     
+ Partials      159      157       -2     
Impacted Files Coverage Δ
kartothek/core/testing.py 75.00% <ø> (+1.56%) ⬆️
kartothek/io/eager.py 77.09% <ø> (ø)
kartothek/core/common_metadata.py 95.17% <93.33%> (-0.22%) ⬇️
kartothek/core/index.py 85.63% <100.00%> (+0.91%) ⬆️
kartothek/io_components/metapartition.py 91.41% <0.00%> (+0.13%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5bc0a53...d2faed5. Read the comment docs.

@ged-steponavicius
Copy link
Author

@lr4d @fjetter

Tests are now passing for pyarrow 0.15 and 0.16

I have done an investigation into earlier pyarrow versions, how they store schema internally appears to be significantly different from later versions. I could not pinpoint it to the changelog entry.

key = "some_uuid/some_table/_common_metadata"
pq_file = pq.ParquetFile(store.open(key))

0.13.0 does not store timezone information:

pq_file.schema #pyarrow==0.13.0
<pyarrow._parquet.ParquetSchema object at 0x000001F0B544E608>
datetime64: INT64 TIMESTAMP_MICROS
datetime64_tz: INT64 TIMESTAMP_MICROS
datetime64_utc: INT64 TIMESTAMP_MICROS

0.14.0 aligns to UTC:

pq_file.schema #pyarrow==0.14.0
<pyarrow._parquet.ParquetSchema object at 0x000001FCE4087148>
datetime64: INT64 Timestamp(isAdjustedToUTC=true, timeUnit=microseconds)
datetime64_tz: INT64 Timestamp(isAdjustedToUTC=true, timeUnit=microseconds)
datetime64_utc: INT64 Timestamp(isAdjustedToUTC=true, timeUnit=microseconds)

I'm not sure what's our the best approach, any suggestions?

PS. sorry for multiple commits, this is my first PR

fjetter and others added 3 commits March 23, 2020 10:06
ged-steponavicius and others added 3 commits April 8, 2020 09:26
Co-Authored-By: Marco Neumann <57095103+marco-neumann-jdas@users.noreply.github.com>
Copy link
Collaborator

@fjetter fjetter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the github diff is a bit confused due to the many merge commits. We probably should squash this on merge.

Comment on lines +139 to +141
keys = np.array(list(self.index_dct.keys()))
labeled_array = pa.array(keys, type=self.dtype)
return np.array(labeled_array.to_pandas(date_as_object=date_as_object))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should already be on master. Is github fooling me or what's happening here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have pulled the latest master before committing, I think GitHub is confused

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the rebase got a bit messed up. @ged-steponavicius could you perhaps try rebasing again on master?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

6 participants