Skip to content

Conversation

@karthick-rn
Copy link
Contributor

@karthick-rn karthick-rn commented Dec 11, 2025

Description: There is no xml file along with .hdf file for modis dataset anymore. Updated the code accordingly, also for STAC item creation it expects the metadata which is now extracted from the .hdf file.

These changes have been tested by rebuilding the image with the new commit hash from the fork and it ingested the STAC item successfully. This is the link to the ingested item into Open PC - https://planetarycomputer.microsoft.com/explore?c=-134.4470%2C55.3130&z=4.32&v=2&d=modis-43A4-061&m=cql%3A5f2d6e100cd82009329bc4df3e51de56&r=Natural+color&s=false%3A%3A100%3A%3Atrue&sr=desc&ae=0

PR checklist:

  • Code is formatted (run scripts/format).
  • Code lints properly (run scripts/lint).
  • Tests pass (run scripts/test).
  • Documentation has been updated to reflect changes, if applicable.
  • Example STAC Catalog has been updated to reflect changes, if applicable.
  • Changes are added to the CHANGELOG.

@gadomski gadomski self-requested a review December 11, 2025 15:23
@gadomski
Copy link
Contributor

@karthick-rn don't worry about the CI errors yet, this package has bit-rotted pretty hard, I'll clean it up.

Copy link
Contributor

@gadomski gadomski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a simple test function that exercises this functionality against https://github.com/stactools-packages/modis/blob/main/tests/data-files/MOD10A2.A2022033.h09v05.061.2022042050729.hdf? You can copy it to a temp directory to ensure there's no alongside XML asset.

Also needs a changelog entry.

@karthick-rn
Copy link
Contributor Author

karthick-rn commented Dec 11, 2025

Can you add a simple test function that exercises this functionality against https://github.com/stactools-packages/modis/blob/main/tests/data-files/MOD10A2.A2022033.h09v05.061.2022042050729.hdf? You can copy it to a temp directory to ensure there's no alongside XML asset.

@gadomski, Added the test function.

$ python -m pytest tests/test_stac.py::test_create_item_from_hdf_without_xml  
============================================ test session starts ============================================
platform darwin -- Python 3.12.12, pytest-9.0.2, pluggy-1.6.0
rootdir: /Users/karthicknarendran/Downloads/modis/modis
configfile: pyproject.toml
collected 1 item                                                                                            

tests/test_stac.py .                                                                                  [100%]

============================================= warnings summary ==============================================
src/stactools/modis/fragments/__init__.py:4
  /Users/karthicknarendran/Downloads/modis/modis/src/stactools/modis/fragments/__init__.py:4: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
    import pkg_resources

src/stactools/modis/file.py:11
  /Users/karthicknarendran/Downloads/modis/modis/src/stactools/modis/file.py:11: DeprecationWarning: stactools.modis.file is deprecated and will be removed in v0.4.0
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
======================================= 1 passed, 2 warnings in 0.47s =======================================

Note: The MOD10A2 (snow cover) product doesn't have the QAPERCENTNOTPRODUCEDCLOUD tag in its HDF metadata, so I also made that field optional in from_cog_tags() to handle this case gracefully.

Also needs a changelog entry.

done

@gadomski
Copy link
Contributor

Note: The MOD10A2 (snow cover) product doesn't have the QAPERCENTNOTPRODUCEDCLOUD tag in its HDF metadata, so I also made that field optional in from_cog_tags() to handle this case gracefully.

This is a bit of a flag — do we know if there are other tags that will be missing for other products? We should probably test against every product type, even if they aren't automated (e.g. the "external data" tests).

@karthick-rn
Copy link
Contributor Author

This is a bit of a flag — do we know if there are other tags that will be missing for other products? We should probably test against every product type, even if they aren't automated (e.g. the "external data" tests).

@gadomski, I manually verified all 53 XML test files (which contain the same metadata as HDF files):

  • All required tags present in all 53 products
  • QAPERCENTNOTPRODUCEDCLOUD missing in 21 products (snow, fire, LAI, ET, GPP/NPP, burned area) - already handled as optional
  • TileID present in all products

The only missing tag is QAPERCENTNOTPRODUCEDCLOUD, which is already made optional. No other tags are missing.

Copy link
Contributor

@gadomski gadomski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting closer, a couple of small tweaks to the test setup and one question about a possibly-optional attribute.

item.validate()


def test_create_item_from_hdf_without_xml() -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pytest provides a fixture for a temporary directory, let's use that instead.

Suggested change
def test_create_item_from_hdf_without_xml() -> None:
def test_create_item_from_hdf_without_xml(tmp_path: Path) -> None:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 211 to 215
"""Test that an item can be created from an HDF file when XML is not available.
This tests the fallback to extracting metadata directly from the HDF file
when the accompanying XML metadata file is not present.
"""
Copy link
Contributor

@gadomski gadomski Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm generally 👎🏼 on docstrings for tests, as I prefer to let the test function name and content speak for themselves.

Suggested change
"""Test that an item can be created from an HDF file when XML is not available.
This tests the fallback to extracting metadata directly from the HDF file
when the accompanying XML metadata file is not present.
"""

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they're auto-generated comments and I have removed them.

Comment on lines 220 to 231
# Copy only the HDF file (not the XML) to ensure XML is not available
temp_hdf_path = os.path.join(temporary_directory, hdf_file)
shutil.copyfile(source_hdf_path, temp_hdf_path)

# Verify XML does not exist in temp directory
temp_xml_path = f"{temp_hdf_path}.xml"
assert not os.path.exists(temp_xml_path), "XML file should not exist"

# Create item from HDF only - should extract metadata from HDF
item = stactools.modis.stac.create_item(temp_hdf_path)

# Verify item was created with correct metadata
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These look like auto-gen comments, again I generally prefer to let the code speak for itself.

Suggested change
# Copy only the HDF file (not the XML) to ensure XML is not available
temp_hdf_path = os.path.join(temporary_directory, hdf_file)
shutil.copyfile(source_hdf_path, temp_hdf_path)
# Verify XML does not exist in temp directory
temp_xml_path = f"{temp_hdf_path}.xml"
assert not os.path.exists(temp_xml_path), "XML file should not exist"
# Create item from HDF only - should extract metadata from HDF
item = stactools.modis.stac.create_item(temp_hdf_path)
# Verify item was created with correct metadata
temp_hdf_path = os.path.join(temporary_directory, hdf_file)
shutil.copyfile(source_hdf_path, temp_hdf_path)
temp_xml_path = f"{temp_hdf_path}.xml"
assert not os.path.exists(temp_xml_path), "XML file should not exist"
item = stactools.modis.stac.create_item(temp_hdf_path)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

assert item is not None
assert item.id.startswith("MOD10A2.A2022033.h09v05")
assert "hdf" in item.assets
assert "metadata" not in item.assets # XML asset should not be present
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert "metadata" not in item.assets # XML asset should not be present
assert "metadata" not in item.assets

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 279 to 282
with warnings.catch_warnings():
warnings.simplefilter("ignore", category=NotGeoreferencedWarning)
with rasterio.open(read_href) as dataset:
hdf_tags = dataset.tags()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to catch this warning? Feels like we do want to propagate this up to the user.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the suppression. It is now visible in the output.

- Revert TileID to required (present in all products)
- Remove NotGeoreferencedWarning suppression
- Use pytest tmp_path fixture in test
- Clean up test comments
@karthick-rn
Copy link
Contributor Author

Getting closer, a couple of small tweaks to the test setup and one question about a possibly-optional attribute.

@gadomski, Thanks for reviewing. I have updated the code based on your comments and also tested it, all looks good. When you get a moment, could you review and approve it?

@gadomski gadomski self-requested a review December 15, 2025 17:27
@gadomski gadomski merged commit 8854ceb into stactools-packages:main Dec 15, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants