Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -22,3 +22,7 @@ __pycache__/
# data packaging ignores
5.data_packaging/packaged
5.data_packaging/images

*.txt
*.csv
*.parquet
Comment on lines +26 to +28
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Global ignore patterns are too broad and will exclude important files.

Adding *.txt, *.csv, and *.parquet as global patterns will ignore these file types across the entire repository, including:

  • Documentation files (README.txt, CHANGELOG.txt, LICENSE.txt)
  • Requirements or configuration files (requirements.txt)
  • Schema files (ironically, this PR includes .arrow.schema.txt files)
  • Important data or metadata files

This could prevent critical files from being tracked and cause confusion for contributors.

✏️ Suggested fix: Scope patterns to specific directories

Replace the global patterns with directory-scoped patterns:

-
-*.txt
-*.csv
-*.parquet
+
+# data packaging artifacts
+5.data_packaging/**/*.txt
+5.data_packaging/**/*.csv
+5.data_packaging/**/*.parquet
+# exclude schema files from ignore
+!5.data_packaging/schema/**/*.txt

Or scope them to the specific subdirectories where these artifacts are generated:

-
-*.txt
-*.csv
-*.parquet
+
+# data packaging artifacts (generated data only)
+5.data_packaging/packaged/**/*.txt
+5.data_packaging/packaged/**/*.csv
+5.data_packaging/packaged/**/*.parquet
+5.data_packaging/images/**/*.txt
+5.data_packaging/images/**/*.csv
+5.data_packaging/images/**/*.parquet
🤖 Prompt for AI Agents
In @.gitignore around lines 26 - 28, Remove the broad global ignore entries
(*.txt, *.csv, *.parquet) from .gitignore and replace them with directory-scoped
ignore rules so only generated/artifact files are excluded (for example restrict
to build/, tmp/, artifacts/, or the specific data output directories used by
your tooling). Locate the three entries in .gitignore and change them to scoped
patterns that reference the exact folders where those generated files appear,
ensuring important repo files like README.txt or requirements.txt remain
tracked.

323 changes: 323 additions & 0 deletions 5.data_packaging/example.ipynb

Large diffs are not rendered by default.

42 changes: 42 additions & 0 deletions 5.data_packaging/example.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# ---
# jupyter:
# jupytext:
# text_representation:
# extension: .py
# format_name: light
# format_version: '1.5'
# jupytext_version: 1.18.1
# kernelspec:
# display_name: Python 3 (ipykernel)
# language: python
# name: python3
# ---

# # Example for using packaged data
#
# This notebook shows how to use the packaged data from this project.

# +
import lancedb
from serpula_rasa.image import show_images_from_lance

# setup a connection to lancedb
ldb = lancedb.connect(uri=(lance_dir := "packaged/lancedb/mitocheck_data"))
# -

# show all table names
# note: all table names include numbers corresponing to the step they came from.
ldb.table_names(limit=20)

ldb.open_table(name="5.data_packaging.location_and_ch5_frame_image_data").to_pandas().head()

show_images_from_lance(
db_path=lance_dir,
table_name="5.data_packaging.location_and_ch5_frame_image_data",
col_name="ome-arrow_original",
max_images=20,
pick="first",
cmap="gray",
base_size=10,
cols=1,
)
Loading