Hey there! I'm Shay. I lead AI & Data at the Adanim Institute, applying AI, data, and operational methods to social welfare systems. Alongside that work, I advise organizations on data science strategy, AI implementation, team building, and applied ML systems.
I co-founded and help lead DataHack, a nonprofit promoting diversity in data science and AI in Israel and supporting social-good data work through programs such as DataCoach, DataNights, DataTalks, and Kaggle-IL. I also teach Deep Learning, Text Mining, and Data Visualization at Tel Aviv University's Business School, and Intro to Machine Learning at The Academic College of Tel Aviv-Yaffo.
My open-source work spans Python data and ML tooling, AI workflow utilities, Hebrew NLP/OCR/audio dataset infrastructure, and coding-agent support tools.
| Project | Description | Stars | Downloads | Forks | Issues | PRs |
| cachier | Persistent caching decorators for Python functions | |||||
| pdpipe | Composable pipelines for pandas DataFrames | |||||
| pulearn | Positive-unlabeled learning with Python | |||||
| skift | scikit-learn wrappers for Python fastText | |||||
| birch | Hierarchical config for Python packages | |||||
| awesome-twitter-data | Twitter/X datasets and resources |
| Project | Description | Stars | Forks | Issues | PRs |
| foldermix | LLM-friendly folder packing CLI | ||||
| pr-agent-context | GitHub Actions PR context for coding agents | ||||
| SynthBanshee | Synthetic Hebrew audio dataset pipeline | ||||
| hocrgen | Hebrew OCR dataset operations tooling | ||||
| leadforge | Synthetic CRM and go-to-market datasets | ||||
| splendor | Local-first, git-native knowledge compiler |
- Emerging Techniques & Breakthroughs in LLMs: A case study through the DeepSeek family of models
A DataNights GenAI lecture on recent LLM techniques and the DeepSeek model family. - LLMs Fundamentals: From BERT to GPT-4o
A two-lecture overview of LLM theory, architecture, and training. - No (working) hands! Engineer-less Data/ML Eng. for pure DS teams
An MDLIOps 2024 talk on rebuilding analytics and ML infrastructure with minimal engineering support.
- Data Science Project Flow for Startups
A data scientist's take on our process. - Peer Reviewing Data Science Projects
Making your work more error-proof using peer scrutiny. - Document Embedding Techniques
A review of notable literature on the topic. - Inferring causality in time series data
A concise review of the major approaches. - Publishing your own Python package
A practical guide to packaging Python code.





