Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions _quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ website:
text: Home
- href: tutorials/progressive_globe.qmd
text: Interactive Explorer
- href: how-to-use.qmd
text: How to Use
- href: tutorials/index.qmd
text: Tutorials
- href: about.qmd
Expand Down Expand Up @@ -55,9 +57,11 @@ website:
- section: "About"
contents:
- href: about.qmd
text: Goals
text: About iSamples
- href: how-to-use.qmd
text: How to Use
- href: people.qmd
text: People
text: Contributors
- section: "Information Architecture"
contents:
- design/index.qmd
Expand Down
108 changes: 79 additions & 29 deletions about.qmd
Original file line number Diff line number Diff line change
@@ -1,53 +1,103 @@
---
title: "About iSamples"
subtitle: "A multi-disciplinary cyberinfrastructure for material samples"
number-sections: false
---

# Project Objectives
## Objectives {.unnumbered}

1. Design and develop iSamples infrastructure (iSamples in a Box and distributed data systems);
2. Build four initial implementations of iSamples for adoption and use case testing (Open Context, GEOME, SESAR, and Smithsonian Institution);
3. Conduct outreach and community engagement to developers, individual researchers, and international organizations concerned with material samples.
1. Design and develop iSamples infrastructure (iSamples in a Box and distributed data systems)
2. Build four initial implementations of iSamples for adoption and use case testing (Open Context, GEOME, SESAR, and Smithsonian Institution)
3. Conduct outreach and community engagement to developers, individual researchers, and international organizations concerned with material samples

## Current Data Access
::: {.callout-note collapse="true"}
### Technical Perspective

**Note**: iSamples Central is currently unavailable. The project has transitioned to a **geoparquet-based approach** for data access and analysis:
The iSamples project will:

- **Primary Data Source**: Comprehensive geoparquet files containing millions of sample records
- **Analysis Platform**: Browser-based tools using DuckDB-WASM and Observable
- **Coverage**: Complete datasets from SESAR, OpenContext, GEOME, and Smithsonian collections
* Create a flexible and scalable architecture to ensure broad adoption and implementation by diverse stakeholders.
* Build upon existing identifier infrastructure such as IGSNs (Global Sample Number) and ARKs (Archival Resource Keys), but is agnostic to identifier type.
* Encourage a high-level metadata standard for natural history samples (across biosciences, geosciences, and archaeology), while supporting community-developed metadata standards in specialist domains.
* Extend existing capabilities, enhance consistency, and expand their reach to serve science and society much more broadly through integration with established discipline-specific infrastructure at SESAR (geoscience), CyVerse (bioscience), Open Context (archaeology), and the Smithsonian Institution.

![iSamples diagram](assets/iSamplesArchitecture.png)
**Current data access**: The project now uses **geoparquet files + DuckDB-WASM** for efficient, browser-based data access and analysis. See the [Interactive Explorer](/tutorials/progressive_globe.html) for a live demo.

![iSamples Architecture](assets/iSamplesArchitecture.png)
:::

# Background
## Team {.unnumbered}

Research frequently uses material samples as a basic element for reference, study, and experimentation in many scientific disciplines, especially in the natural and environmental sciences, material sciences, agriculture, physical anthropology, archaeology, and biomedicine. Observations made on samples collected in the field and in the laboratory constitute a critical data resource for research that addresses grand challenges of our planet’s future sustainability, from environmental change; to food, energy, and water resources; to natural hazards and their mitigation; to public health. The large investments of public funds being made to curate huge volumes of samples acquired over decades or even centuries, and to collect and analyze new samples demand these samples to be openly accessible, easily discoverable, and documented with sufficient information to make them reusable. The current ecosystem of sample and sample data management in the U.S. and globally is highly fragmented across stakeholders, including museums, federal agencies, academic institutions, and individual researchers, with a multitude of institutional and discipline-specific catalogs, practices for sample identification, and protocols for describing samples.
### Principal Investigators {.unnumbered}

The iSamples project is a multi-disciplinary collaboration that will develop a national digital infrastructure that will provide services for globally unique, consistent, and convenient identification of material samples; metadata about them; and linking them to other samples, derived data, and research results published in the literature. iSamples builds on previous initiatives to achieve this by providing material samples with globally unique, persistent identifiers that reliably link to landing pages with metadata describing the sample and its provenance, and which allow unambiguously linking samples with data and publications.
* [Kerstin Lehnert](https://orcid.org/0000-0001-7036-1977), Columbia University
* [Andrea Thomer](https://orcid.org/0000-0001-6238-3498), University of Arizona
* [Neil Davies](https://orcid.org/0000-0001-8085-5014), The Regents of the University of California, Berkeley
* [David Vieglais](https://orcid.org/0000-0002-6513-4996), University of Kansas Biodiversity Institute

Leveraging significant national investments, iSamples provides the missing link among:
::: {.callout-note collapse="true"}
### Contributors

1. physical collections (e.g., natural history museums, herbaria, biobanks),
2. field stations, marine laboratories, long-term ecological research sites, and observatories, and
3. data repositories and cyberinfrastructure. iSamples delivers enhanced infrastructure for STEM research and education, decision-makers, and the general public.
:::: {.columns}
::: {.column width="34%"}
* Cao, Sean
* Choe, Saebyl
* Cui, Hong
* Davies, Neil (PI)
* Deck, John
* Kansa, Eric C
* Kansa, Sarah Whitcher
:::
::: {.column width="34%"}
* Kunze, John
* Lehnert, Kerstin (PI)
* Mandel, Danny
* Meyer, Christopher
* Ramdeen, Sarah
* Raia, Natalie
* Richard, Steve
:::
::: {.column width="32%"}
* Robinson, Erin
* Snyder, Rebecca
* Song, Lu-lin
* Thomer, Andrea (PI)
* Vieglais, Dave (PI)
* Walls, Ramona L
* Yee, Raymond
:::
::::
:::

iSamples benefits national security and resource management by offering a means to assure sample provenance, improving scientific reproducibility and demonstrating compliance with ethical standards, national regulations, and international treaties, (e.g., automated audits of sensitive archaeological specimens, endangered species, or specimens containing controlled substances).
## Photo Gallery {.unnumbered}

# Technical perspective
::: {layout-ncol=3}

The iSamples project will:
![RCN Workshop, NYU](assets/RCN_workshop_NYU.jpg){group="gallery"}

* Create a flexible and scalable architecture to ensure broad adoption and implementation by diverse stakeholders.
* Build upon on existing identifier infrastructure such as IGSNs (Global Sample Number;) and ARKs (Archival Resource Keys), but is agnostic to identifier type.
* Encourage a high-level metadata standard for natural history samples (across biosciences, geosciences, and archaeology), while supporting community-developed metadata standards in specialist domains.
* Extend existing capabilities, enhance consistency, and expand their reach to serve science and society much more broadly through integration with established discipline-specific infrastructure at the System for Earth Sample Registration SESAR (geoscience), CyVerse (bioscience), Open Context (archaeology), and the Smithsonian Institution.
![Workshop, Tucson](assets/workshop_Tucson.jpg){group="gallery"}

![Workshop, Smithsonian](assets/workshop_SI.jpg){group="gallery"}

# Principal Investigators
![Outside Smithsonian MSC](assets/outside_SI_MSC.jpg){group="gallery"}

* [Kerstin Lehnert](https://orcid.org/0000-0001-7036-1977), Columbia University
* [Andrea Thomer](https://orcid.org/0000-0001-6238-3498), University of Arizona
* [Neil Davies](https://orcid.org/0000-0001-8085-5014), The Regents of the University of California, Berkeley
* [David Vieglais](https://orcid.org/0000-0002-6513-4996), University of Kansas Biodiversity Institute
![Tour, Smithsonian MSC](assets/tour_SI_MSC.jpg){group="gallery"}

![Workshop, Moorea](assets/workshop_Moorea.jpg){group="gallery"}

:::

## Background & History {.unnumbered}

Research frequently uses material samples as a basic element for reference, study, and experimentation in many scientific disciplines, especially in the natural and environmental sciences, material sciences, agriculture, physical anthropology, archaeology, and biomedicine. Observations made on samples collected in the field and in the laboratory constitute a critical data resource for research that addresses grand challenges of our planet's future sustainability, from environmental change; to food, energy, and water resources; to natural hazards and their mitigation; to public health.

The large investments of public funds being made to curate huge volumes of samples acquired over decades or even centuries, and to collect and analyze new samples demand these samples to be openly accessible, easily discoverable, and documented with sufficient information to make them reusable.

The iSamples project is a multi-disciplinary collaboration that developed a national digital infrastructure providing services for globally unique, consistent, and convenient identification of material samples; metadata about them; and linking them to other samples, derived data, and research results published in the literature.

Leveraging significant national investments, iSamples provides the missing link among:

1. Physical collections (e.g., natural history museums, herbaria, biobanks)
2. Field stations, marine laboratories, long-term ecological research sites, and observatories
3. Data repositories and cyberinfrastructure

iSamples benefits national security and resource management by offering a means to assure sample provenance, improving scientific reproducibility and demonstrating compliance with ethical standards, national regulations, and international treaties.
55 changes: 55 additions & 0 deletions how-to-use.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
---
title: "How to Use iSamples"
subtitle: "Get started exploring 6.7 million scientific samples"
number-sections: false
---

## Quick Start {.unnumbered}

1. **Open the [Interactive Explorer](/tutorials/progressive_globe.html)** — a 3D globe loads with clustered sample data
2. **Zoom in** — clusters break into finer detail as you zoom (resolution 4 → 6 → 8 → individual samples)
3. **Filter by source** — use the checkboxes to show/hide data from SESAR, OpenContext, GEOME, or Smithsonian
4. **Click a cluster** — see sample count and nearby samples with links to source records
5. **Click an individual sample** — view metadata and follow the "View at source" link to the original repository
6. **Share your view** — copy the URL to share your exact position, zoom level, and selected sample

## What's in the Data? {.unnumbered}

| Source | Samples | Focus |
|--------|---------|-------|
| **SESAR** | 4.6M | Earth science — rocks, minerals, sediments, soils |
| **OpenContext** | 1M | Archaeology — artifacts, excavation materials |
| **GEOME** | 605K | Biology — genomic and tissue specimens |
| **Smithsonian** | 322K | Natural history — museum collections |

## No Installation Required {.unnumbered}

Everything runs in your browser using:

- **DuckDB-WASM** — a fast analytical database running client-side
- **HTTP range requests** — only the data you need is downloaded (typically < 1 MB to start)
- **Cesium** — 3D globe visualization

Works in Chrome, Firefox, Edge, Safari, and Brave. No plugins, no downloads, no accounts.

## For Developers {.unnumbered}

All code is visible and foldable on tutorial pages. Want to build your own analysis?

- **[Tutorials](/tutorials/)** — step-by-step guides from basic exploration to advanced analysis
- **[Deep-Dive Analysis](/tutorials/zenodo_isamples_analysis.html)** — statistical exploration with Observable Plot
- **[GitHub](https://github.com/isamplesorg/)** — all source code and data pipelines
- **[Zenodo](https://zenodo.org/communities/isamples)** — archived datasets for reproducible research

## Data Files {.unnumbered}

All data is hosted on Cloudflare R2 with HTTP range request support:

| File | Size | Description |
|------|------|-------------|
| Wide format (H3-indexed) | ~292 MB | 20M rows, all entity types with H3 spatial indices |
| H3 summary (res4) | ~70 KB | Pre-aggregated cluster counts for instant globe load |
| H3 summary (res6) | ~200 KB | Mid-zoom cluster detail |
| H3 summary (res8) | ~600 KB | Fine-zoom cluster detail |
| Samples lite | ~150 MB | Individual sample points with coordinates |
| Facet summaries | 2 KB | Pre-computed filter counts (source, material, context, specimen type) |
125 changes: 122 additions & 3 deletions tutorials/progressive_globe.qmd
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
---
title: "Progressive Globe: Instant H3 → Detail on Demand"
title: "Interactive Explorer"
subtitle: "Search and explore 6.7 million material samples"
categories: [parquet, spatial, h3, performance, isamples]
sidebar: false
---

Explore **6.7 million material samples** from iSamples — the globe loads instantly with H3 hexagonal aggregates, then refines as you zoom down to individual samples.

::: {.callout-note collapse="true"}
## How It Works

Expand Down Expand Up @@ -92,9 +92,49 @@ Circle size = log(sample count). Color = dominant data source.
}
.share-btn:hover { background: #0d47a1; }
.share-toast { font-size: 12px; color: #2e7d32; opacity: 0; transition: opacity 0.3s; }
.search-bar { display: flex; gap: 6px; margin-bottom: 12px; }
.search-bar input {
flex: 1; padding: 8px 12px; border: 1px solid #ccc; border-radius: 4px;
font-size: 14px; outline: none;
}
.search-bar input:focus { border-color: #1565c0; box-shadow: 0 0 0 2px rgba(21,101,192,0.15); }
.search-bar button {
background: #1565c0; color: white; border: none; padding: 8px 16px;
border-radius: 4px; cursor: pointer; font-size: 14px; white-space: nowrap;
}
.search-bar button:hover { background: #0d47a1; }
.search-results { font-size: 12px; color: #666; padding: 4px 0; }
.filter-section { border-top: 1px solid #eee; padding-top: 8px; margin-top: 8px; }
.filter-header {
font-size: 12px; font-weight: 600; color: #555; cursor: pointer;
display: flex; justify-content: space-between; align-items: center;
padding: 4px 0; user-select: none;
}
.filter-header:hover { color: #1565c0; }
.filter-body { padding: 4px 0; }
.filter-body label { display: block; font-size: 12px; padding: 2px 0; cursor: pointer; }
.filter-body label:hover { color: #1565c0; }
</style>

::: {.callout-note collapse="true"}
## How It Works

1. **Instant** (<1s): Pre-aggregated H3 res4 summary (580 KB) → 38K colored circles
2. **Zoom in**: Automatically switches to res6 (112K) then res8 (176K) clusters
3. **Zoom deeper** (<120 km): Individual sample points from 60 MB lite parquet
4. **Click**: Cluster info or individual sample card with full metadata
5. **Search**: Find samples by name — results fly to the location on the globe

Circle size = log(sample count). Color = dominant data source.
:::

<!-- Static layout: globe + side panel. Updated via DOM, not OJS reactivity. -->
<div class="search-bar">
<input type="text" id="sampleSearch" placeholder="Search samples (e.g., basalt, pottery, coral...)" />
<button id="searchBtn">Search</button>
</div>
<div id="searchResults" class="search-results"></div>

<div class="globe-layout">
<div id="cesiumContainer"></div>
<div class="side-panel">
Expand Down Expand Up @@ -911,6 +951,85 @@ zoomWatcher = {
});
}

// --- Search handler ---
const searchBtn = document.getElementById('searchBtn');
const searchInput = document.getElementById('sampleSearch');
const searchResults = document.getElementById('searchResults');

async function doSearch() {
const term = searchInput.value.trim();
if (!term || term.length < 2) {
searchResults.textContent = 'Type at least 2 characters';
return;
}
searchResults.textContent = 'Searching...';
try {
const escaped = term.replace(/'/g, "''");
const results = await db.query(`
SELECT pid, label, source, latitude, longitude, place_name
FROM read_parquet('${lite_url}')
WHERE label ILIKE '%${escaped}%'
${sourceFilterSQL('source')}
LIMIT 50
`);
if (results.length === 0) {
searchResults.textContent = `No results for "${term}"`;
return;
}
searchResults.textContent = `${results.length}${results.length === 50 ? '+' : ''} results for "${term}"`;

// Show results in the samples panel
const sampEl = document.getElementById('samplesSection');
if (sampEl) {
let h = `<h4>Search: "${term}" (${results.length})</h4>`;
for (const s of results) {
const color = SOURCE_COLORS[s.source] || '#666';
const name = SOURCE_NAMES[s.source] || s.source;
const sUrl = sourceUrl(s.pid);
h += `<div class="sample-row" style="cursor: pointer;" data-lat="${s.latitude}" data-lng="${s.longitude}" data-pid="${s.pid}">
<div style="display: flex; align-items: center; gap: 6px;">
${sUrl ? `<a class="sample-label" href="${sUrl}" target="_blank" rel="noopener noreferrer" style="color: #1565c0; text-decoration: none;">${s.label || s.pid}</a>` : `<span class="sample-label">${s.label || s.pid}</span>`}
<span class="source-badge" style="background: ${color}; font-size: 10px;">${name}</span>
</div>
</div>`;
}
sampEl.innerHTML = h;

// Click search result → fly to it
sampEl.querySelectorAll('.sample-row[data-lat]').forEach(row => {
row.addEventListener('click', (e) => {
if (e.target.tagName === 'A') return; // let links work
const lat = parseFloat(row.dataset.lat);
const lng = parseFloat(row.dataset.lng);
const pid = row.dataset.pid;
if (!isNaN(lat) && !isNaN(lng)) {
viewer.camera.flyTo({
destination: Cesium.Cartesian3.fromDegrees(lng, lat, 50000),
duration: 1.5
});
}
});
});
}

// Fly to the first result
if (results[0].latitude && results[0].longitude) {
viewer.camera.flyTo({
destination: Cesium.Cartesian3.fromDegrees(results[0].longitude, results[0].latitude, 200000),
duration: 1.5
});
}
} catch(err) {
console.error("Search failed:", err);
searchResults.textContent = `Search error: ${err.message}`;
}
}

if (searchBtn) searchBtn.addEventListener('click', doSearch);
if (searchInput) searchInput.addEventListener('keydown', (e) => {
if (e.key === 'Enter') doSearch();
});

// --- Deep-link: restore selection from initial hash ---
const ih = viewer._initialHash;
if (ih.pid) {
Expand Down
Loading