-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Since the current implementation is a directed acyclic graph, it should be possible to perform a topological sort over the hidden layers and build an optimized set of stages/tensors for better performance and resource utilization. This would also open up the avenue for GPU parallelization via compute shaders or CUDA (though I haven't researched CUDA in depth yet).
However, due to the fact that the structure of these networks changes extremely often and the fact that I want to preserve the DAG model at this project's core, there can be a performance concern with having to rebuild these optimized structures every generation. Depending on the fitness function and the number of times they run predictions during each generation, using this structure could negatively impact performance. Thus, I believe that the original scheduling system should be preserved, while additionally offering this structure to the end user and encouraging its use.