Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 41 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ know what they are looking for.
| Go modules | `go` | `go.sum`, `go.mod` |
| RubyGems | `rubygems` | `Gemfile.lock`, installed `*.gemspec` |
| Composer | `packagist` | `composer.lock`, `vendor/composer/installed.json` |
| Conda / pixi | `conda` | `<env>/conda-meta/<name>-<version>-<build>.json` install records |
| MCP | `mcp` | JSON host configs: `mcp.json`, `.mcp.json`, `claude_desktop_config.json`, `mcp_config.json`, `mcp_settings.json`, `cline_mcp_settings.json`, plus `~/.gemini/settings.json` (Gemini CLI / Code Assist). Non-JSON configs (Codex `config.toml`, Continue YAML) are not parsed in v0.1. |
| Editor extensions | `editor-extension` | VS Code, Cursor, Windsurf, VSCodium manifests |
| Browser extensions | `browser-extension` | Chromium-family (`manifest.json`) and Firefox (`extensions.json`) per profile |
Expand Down Expand Up @@ -195,6 +196,46 @@ Package record:

</details>

<details>
<summary>Example conda package record</summary>

```json
{
"record_type": "package",
"record_id": "package:...",
"schema_version": "0.1.0",
"scanner_name": "bumblebee",
"scanner_version": "v0.1.1",
"run_id": "3a8c7d1e9f0b2a4c6d8e0f1a2b3c4d5e",
"scan_time": "2026-05-15T18:22:01.482Z",
"endpoint": {
"hostname": "alex-mbp",
"os": "darwin",
"arch": "arm64",
"username": "alex",
"uid": "501"
},
"profile": "baseline",
"ecosystem": "conda",
"package_name": "conda-build",
"normalized_name": "conda-build",
"version": "3.21.9",
"project_path": "/opt/homebrew/anaconda3",
"root_kind": "homebrew_root",
"package_manager": "conda",
"source_type": "conda-meta",
"source_file": "/opt/homebrew/anaconda3/conda-meta/conda-build-3.21.9-py39h6e9494a_1.json",
"has_lifecycle_scripts": false,
"confidence": "high"
}
```

`package_manager` is `pip` instead of `conda` when the record represents
a pip-installed package conda has recorded under the same `conda-meta/`
directory (these have either `schannel: "pypi"` or `channel: "pypi"`).

</details>

`confidence`:

- `high` — exact identity and version came from canonical metadata.
Expand Down
4 changes: 2 additions & 2 deletions cmd/bumblebee/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -131,8 +131,8 @@ func registerScanFlags(fs *flag.FlagSet, o *scanOpts) {
"scan profile: baseline (bounded known package/tool roots), project (configured developer/project roots), or deep (incident-response exposure scan; may include user home roots)")
fs.Var(&o.roots, "root", "directory to scan (repeatable or comma-separated; unrelated to running as root). Required for deep; optional for baseline/project.")
fs.Var(&o.excludes, "exclude", "additional directory name or suffix path to exclude (repeatable)")
fs.Var(&o.ecosystems, "ecosystem", "limit scanning to emitted ecosystem values (repeatable or comma-separated): npm,pypi,go,rubygems,packagist,mcp,editor-extension,browser-extension")
fs.Int64Var(&o.maxFileSize, "max-file-size", 5*1024*1024, "max bytes to read from any single metadata file")
fs.Var(&o.ecosystems, "ecosystem", "limit scanning to emitted ecosystem values (repeatable or comma-separated): npm,pypi,go,rubygems,packagist,conda,mcp,editor-extension,browser-extension")
fs.Int64Var(&o.maxFileSize, "max-file-size", 16*1024*1024, "max bytes to read from any single metadata file. The default accommodates conda-meta install records, which can run to several MB on packages that ship many files (vs <100KB for most npm/pypi metadata)")
fs.DurationVar(&o.maxDuration, "max-duration", 0, "max wall-clock duration for the whole scan (0 = unbounded)")
fs.IntVar(&o.concurrency, "concurrency", 4, "number of concurrent file parsers")

Expand Down
24 changes: 23 additions & 1 deletion cmd/bumblebee/roots.go
Original file line number Diff line number Diff line change
Expand Up @@ -222,6 +222,17 @@ func baselineHomeCandidates(home string) []scanner.Root {
}
add(filepath.Join(home, ".local", "share", "pipx", "venvs"), model.RootKindUserPackage)

// Conda/mamba/pixi install prefixes. Each prefix contains a base
// environment under conda-meta/ and additional named environments
// under envs/<name>/conda-meta/. filterExistingRoots drops any
// prefix that is not present on this host.
add(filepath.Join(home, ".pixi"), model.RootKindUserPackage)
add(filepath.Join(home, "miniconda3"), model.RootKindUserPackage)
add(filepath.Join(home, "anaconda3"), model.RootKindUserPackage)
add(filepath.Join(home, "miniforge3"), model.RootKindUserPackage)
add(filepath.Join(home, "mambaforge"), model.RootKindUserPackage)
add(filepath.Join(home, "micromamba"), model.RootKindUserPackage)

// Editor extension trees.
for _, seg := range []string{
".vscode/extensions",
Expand Down Expand Up @@ -289,11 +300,22 @@ func projectHomeCandidates(home string) []scanner.Root {
func systemRoots() []scanner.Root {
switch runtime.GOOS {
case "darwin":
return []scanner.Root{
roots := []scanner.Root{
{Path: "/opt/homebrew/lib", Kind: model.RootKindHomebrew},
{Path: "/usr/local/lib", Kind: model.RootKindHomebrew},
{Path: "/Library/Python", Kind: model.RootKindHomebrew},
}
// Homebrew anaconda casks install to /opt/homebrew/anaconda3
// (Apple Silicon) or /usr/local/anaconda3 (Intel), outside
// /opt/homebrew/lib. Globbed so versioned variants
// (`anaconda3-2024.02`, etc.) are also picked up; absent
// prefixes are dropped by filterExistingRoots.
for _, pattern := range []string{"/opt/homebrew/anaconda*", "/usr/local/anaconda*"} {
for _, p := range globExisting(pattern) {
roots = append(roots, scanner.Root{Path: p, Kind: model.RootKindHomebrew})
}
}
return roots
case "linux":
roots := []scanner.Root{{Path: "/usr/local/lib", Kind: model.RootKindGlobalPackage}}
for _, pattern := range []string{"/usr/lib/python*"} {
Expand Down
16 changes: 9 additions & 7 deletions cmd/bumblebee/selftest.go
Original file line number Diff line number Diff line change
Expand Up @@ -23,12 +23,14 @@ var selftestFS embed.FS

// expectedSelftestFindings is the count of catalog-matched findings the
// embedded fixtures must produce. One npm package-lock.json entry, one
// PyPI dist-info METADATA file, and one MCP config naming a pinned
// docker image — each matched against the embedded catalog: three
// findings. The MCP fixture guards against regressions in the MCP
// parser/scanner integration (basename dispatch, docker tag split,
// catalog matching for the mcp ecosystem).
const expectedSelftestFindings = 3
// PyPI dist-info METADATA file, one MCP config naming a pinned docker
// image, and one conda-meta install record — each matched against the
// embedded catalog: four findings. The MCP fixture guards against
// regressions in the MCP parser/scanner integration (basename dispatch,
// docker tag split, catalog matching for the mcp ecosystem); the conda
// fixture guards against regressions in conda-meta path detection and
// the conda-ecosystem catalog-matching path.
const expectedSelftestFindings = 4

// runSelftest extracts the embedded fixture tree to a temp directory,
// runs the scanner with the embedded exposure catalog, and asserts the
Expand Down Expand Up @@ -87,7 +89,7 @@ func runSelftest(args []string) int {
cfg := scanner.Config{
Profile: model.ProfileProject,
Roots: []scanner.Root{{Path: tmp, Kind: model.RootKindProject}},
MaxFileSize: 5 * 1024 * 1024,
MaxFileSize: 16 * 1024 * 1024,
Concurrency: 2,
Catalog: catalog,
BaseRecord: base,
Expand Down
9 changes: 9 additions & 0 deletions cmd/bumblebee/selftest/catalog.json
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,15 @@
"versions": ["0.0.0"],
"severity": "critical",
"source": "bumblebee selftest"
},
{
"id": "selftest-conda-evil",
"name": "bumblebee selftest fixture (conda)",
"ecosystem": "conda",
"package": "bumblebee-selftest-evil",
"versions": ["0.0.0"],
"severity": "critical",
"source": "bumblebee selftest"
}
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
{
"name": "bumblebee-selftest-evil",
"version": "0.0.0",
"build": "py_0",
"build_number": 0,
"channel": "https://conda.anaconda.org/bumblebee-selftest/",
"subdir": "noarch",
"depends": []
}
64 changes: 59 additions & 5 deletions docs/inventory-sources.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,12 @@ by recent supply-chain incidents — see the [Why these ecosystems](#why-these-e
section at the bottom for the reporting that informed it.

The `ecosystem` field on every record matches OSV ecosystem identifiers
where one exists (`npm`, `pypi`, `go`, `rubygems`, `packagist`, ...). `mcp`
and `editor-extension` are project-local values for execution surfaces that
do not map cleanly to a package registry; both are emitted without resolved
package versions.
where one exists (`npm`, `pypi`, `go`, `rubygems`, `packagist`, ...). For
ecosystems OSV does not yet enumerate, the value matches the
[Package URL](https://github.com/package-url/purl-spec) `type` convention
instead (`conda`). `mcp` and `editor-extension` are project-local values
for execution surfaces that do not map cleanly to a package registry;
both are emitted without resolved package versions.

## `ecosystem` vs source toolchain

Expand All @@ -29,7 +31,7 @@ Each scan profile reads from a different slice of the sources below:

| Profile | Sources walked |
|-------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `baseline` | Homebrew lib prefixes; `/Library/Python`; Linux system Python (`/usr/lib/python*`, plus `/usr/local/lib`); user Python (`~/.local/lib/python*`, `~/.local/share/pipx/venvs`, `pyenv`); language version managers (`asdf`, `nvm`, `rbenv`, `rvm`); `~/.cargo`; `~/go`; editor-extension trees; MCP config locations; per-profile browser-extension trees (Chromium-family + Firefox-family, including common snap/flatpak paths). No project trees. |
| `baseline` | Homebrew lib prefixes; `/Library/Python`; Linux system Python (`/usr/lib/python*`, plus `/usr/local/lib`); user Python (`~/.local/lib/python*`, `~/.local/share/pipx/venvs`, `pyenv`); language version managers (`asdf`, `nvm`, `rbenv`, `rvm`); `~/.cargo`; `~/go`; conda/pixi prefixes (`~/.pixi`, `~/miniconda3`, `~/anaconda3`, `~/miniforge3`, `~/mambaforge`, `~/micromamba`); editor-extension trees; MCP config locations; per-profile browser-extension trees (Chromium-family + Firefox-family, including common snap/flatpak paths). No project trees. |
| `project` | Configured developer/project roots (`~/code`, `~/src`, `~/Developer`, `~/Projects`, `~/workspace`, and any explicit `--root`). All ecosystem parsers below apply within those trees. |
| `deep` | Operator-supplied roots, typically a bare home directory during a campaign. Same ecosystem parsers; recommended only in combination with `--exposure-catalog` to emit `record_type=finding` records. |

Expand Down Expand Up @@ -235,6 +237,58 @@ References:
- `composer.lock` format: <https://getcomposer.org/doc/01-basic-usage.md#commit-your-composer-lock-file-to-version-control>
- `vendor/composer/installed.json` (Composer v2): <https://getcomposer.org/doc/articles/plugins.md>

## Conda / pixi

Files read:

- `<env>/conda-meta/<name>-<version>-<build>.json` — the per-package
install record conda, mamba, micromamba, and pixi all write when a
package is linked into an environment prefix. Each record carries
exact `name`, `version`, `build`, and channel fields emitted by the
package builder, so a parsed record is high-confidence proof that the
named package version is currently linked into the surrounding
environment.

The walker matches any `*.json` file whose immediate parent directory is
`conda-meta`; the sibling `history` text file is not JSON and is not
matched. The `conda-meta` schema has many additional fields
(`files`, `paths_data`, `link`, `depends`, `constrains`, ...) which are
deliberately not decoded; only `name`, `version`, `channel`, and
`schannel` are read.

The environment prefix that owns the record (the parent of `conda-meta/`)
is recorded as `project_path` so receivers can group records by
environment. `package_manager` is `conda` for records installed from a
conda channel and `pip` for pip-installed packages that conda has
recorded under the same `conda-meta/` directory; this preserves the
ability to tell pip and conda installs apart inside a shared env. The
channel-extraction rules (canonical `schannel`, fallback to the first
path segment of the `channel` URL, bare-string `"pypi"` / `"<unknown>"`)
are documented exhaustively in the `channelFromURL` doc-comment in
[`internal/ecosystem/conda/conda.go`](../internal/ecosystem/conda/conda.go).

The `pixi.lock` project lockfile and `pixi.toml` / `pyproject.toml`
`[tool.pixi]` manifests are NOT parsed in this release. `conda-meta` is
the authoritative installed-state source and is shared by every
conda-compatible package manager, so it carries the highest-signal
inventory of what is actually linked into an environment right now.

The baseline profile adds `~/.pixi`, `~/miniconda3`, `~/anaconda3`,
`~/miniforge3`, `~/mambaforge`, and `~/micromamba` as user-package roots
when present; project-tree scans pick up `.pixi/envs/*/conda-meta/`
inside walked workspaces automatically.

The emitted `ecosystem` value is `conda`. OSV does not yet define a
conda ecosystem identifier; the value matches the
[Package URL](https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst#conda)
`pkg:conda/<name>@<version>` convention.

References:

- conda-meta record schema (per-package JSON): <https://docs.conda.io/projects/conda/en/latest/user-guide/concepts/pkg-specs.html>
- Pixi environments and lockfile layout: <https://pixi.sh/latest/reference/project_configuration/>
- PURL `conda` type: <https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst#conda>

## MCP server configs

Files read (JSON only):
Expand Down
Loading