Skip to content
This repository was archived by the owner on Jul 13, 2025. It is now read-only.
This repository was archived by the owner on Jul 13, 2025. It is now read-only.

Integration of Hamilton DAG flow for feature engineering #100

@Nikronic

Description

@Nikronic

Intro

As you can guess from the notebook file and also vizard.data.preprocessor that the whole feature engineering part is a mess! Hamilton library seems like a valid solution.

Note: If using this library leads to resolving the aforementioned challenges, I think it's going to be our go-to for solving any project involving feature engineering.

Description

To fix the mess, a cleaner approach that is systematic and does not require "looking and running cell-by-cell" is desired. It seems (I am not sure otherwise I would have made it a HIGH PRIORITY to get it done asap) that Hamilton library does cover all of our challenges, including:

  • tracking dependencies of columns on each other: sometimes feature C is produced by first modifying A and B (i.e., B,C -> A)
  • testing cell by cell
  • easier integration of code and feature engineering inside the library (i.e., vizard)
  • lightweight and human-readable

Note: Currently I assign myself as the only assignee until @aliinreallife let me know if he wants to join.


pre-requirements:

  • does @aliinreallife want to join this?
  • does this library actually solve the problem? (requires discussing and some minor tests before directly using on the code base)

Metadata

Metadata

Labels

dataIssues related to data collection, parsing, and processing. Might share with `Model`enhancementNew feature or requestpriority=MediumNeeds above average amount of the attention!

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions