-
-
Notifications
You must be signed in to change notification settings - Fork 24
Replace many Pandas operations with NumPy #198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Congratulations on making your first pull request to Data Morph! Please familiarize yourself with the contributing guidelines, if you haven't already.
|
Thanks for the PR, @JCGoran! As I'm sure you've seen, I have a backlog to get through 😄 I hope to get to this in the next few weeks. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #198 +/- ##
=======================================
Coverage 98.53% 98.53%
=======================================
Files 58 58
Lines 1907 1915 +8
Branches 114 114
=======================================
+ Hits 1879 1887 +8
Misses 25 25
Partials 3 3
|
stefmolin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's start by pulling the LineCollection changes into a separate PR.
298870b to
e708f6c
Compare
|
Bump, this is more or less ready for review as-is. |
|
I haven't forgotten 😄 I'm going to work through the PyCon Taiwan sprint PRs first since I couldn't get to them all at the event, and I want to think more about the design of the internals here. I'm traveling right now and will have very limited time for the next couple of weeks. |
| A dataset with columns x and y. | ||
| x : Iterable[Number] | ||
| The ``x`` value of the dataset. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| x, y = ( | ||
| start_shape.df['x'].to_numpy(copy=True), | ||
| start_shape.df['y'].to_numpy(copy=True), | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we use the _x and _y from the Dataset.__init__() changes?
| x, y = ( | |
| start_shape.df['x'].to_numpy(copy=True), | |
| start_shape.df['y'].to_numpy(copy=True), | |
| ) | |
| x, y = start_shape._x, start_shape._y |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also wondering if we need to copy here, when we copy in the loop.
| self._x = self.df['x'].to_numpy() | ||
| self._y = self.df['y'].to_numpy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| self._x = self.df['x'].to_numpy() | |
| self._y = self.df['y'].to_numpy() | |
| self._x, self._y = self.df[['x', 'y']].to_numpy().T |
| y1 : Iterable[Number] | ||
| The original value of ``y``. | ||
| x2 : Iterable[Number] | ||
| The perturbed value of ``x``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra space:
| The perturbed value of ``x``. | |
| The perturbed value of ``x``. |
| self._x = self.df['x'].to_numpy() | ||
| self._y = self.df['y'].to_numpy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these be properties? If we change the DataFrame, these will no longer match.
| morphed_data = perturbed_data | ||
| if self._is_close_enough(x, y, *perturbed_data): | ||
| x, y = perturbed_data | ||
| morphed_data = pd.DataFrame({'x': x, 'y': y}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't necessary in the loop with switch to NumPy. We can have _record_frames() only make the DataFrame if we need to save the CSV. The plot() function can be reworked to use NumPy, and to return the DataFrame at the end of this method, we can do that outside of this loop instead of doing it thousands of times.
Describe your changes
Perf before:
Perf after:
which is more or less in-line with the circular shapes.
Checklist