Add web search UI by welsberr · Pull Request #4 · timschmidt/wolfe

welsberr · 2026-05-05T17:46:43Z

A web search UI for Wolfe.

Establishes a locally-served page. Search phrase input, number of top matches, submit search button. Matches display by filename, and can also incorporate metadata (additional scripts for the metadata analysis). The search phrase is processed with local Jina for the embedding.

Also added a 'context' match function that provides a flattened text result with N prior and succeeding chunks past matches.

gemini-code-assist

Code Review

This pull request significantly expands the Wolfe search tool by introducing a local web UI, a persistent embedding service, and a comprehensive corpus enrichment pipeline for extracting bibliographic metadata, references, and concept phrases. It also adds support for remote OpenAI-compatible embedding endpoints, a lexical match context search mode, and improved CUDA device management. Feedback focused on removing a hardcoded row limit in index retrieval to prevent data truncation, implementing atomic file writes for metadata catalogs to ensure reliability, and reducing the network timeout for remote embedding requests.

gemini-code-assist · 2026-05-05T17:51:10Z

+    let table = open_table(connection, table_name)
+        .await
+        .ok_or("search table does not exist")?;
+    let mut results = table.query().limit(1_000_000).execute().await?;


Hardcoding a limit of 1,000,000 rows in all_index_rows may cause data truncation for large corpora. Consider removing the limit or making it configurable.

Suggested change

let mut results = table.query().limit(1_000_000).execute().await?;

let mut results = table.query().execute().await?;

gemini-code-assist · 2026-05-05T17:51:10Z

+            fs::create_dir_all(parent)?;
+        }
+    }
+    let catalog_file = fs::File::create(&args.metadata_catalog)?;


Writing directly to the catalog file is not atomic. If the process fails, the file may be left in a corrupted state. Consider writing to a temporary file and renaming it to the final destination.

gemini-code-assist · 2026-05-05T17:51:10Z

+
+    request = urllib.request.Request(embedding_url, data=payload, headers=headers, method="POST")
+    try:
+        with urllib.request.urlopen(request, timeout=300) as response:


A 300-second timeout for an HTTP request is quite long and may cause the process to hang if the embedding service is unresponsive. Consider a shorter timeout.

Suggested change

with urllib.request.urlopen(request, timeout=300) as response:

with urllib.request.urlopen(request, timeout=30) as response:

welsberr added 15 commits May 3, 2026 23:51

Add local web search UI

729afe1

Support indexed CUDA devices

f7b7a67

Use fp16 autocast for Jina on older CUDA GPUs

35c15d5

Use FP32 for Jina on older CUDA GPUs

11efe36

Add persistent Jina embedding endpoint support

3ee6ac5

Handle Torch compiler compatibility for embeddings

fd95fae

Disable expandable CUDA allocator for embeddings

ac74862

Add lexical match context export

1db3da0

Add path-list ingest mode

e5cfe29

Document science corpus enrichment plan

0111822

Add source metadata enrichment pass

e3bf00b

Add reference candidate enrichment pass

efc7bd2

Add concept candidate enrichment pass

e0316ee

Support service-backed text ingestion

ef2f10c

Show metadata in web search results

3fd91db

gemini-code-assist Bot reviewed May 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add web search UI#4

Add web search UI#4
welsberr wants to merge 15 commits intotimschmidt:mainfrom
welsberr:add-web-search-ui

welsberr commented May 5, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 5, 2026

Uh oh!

gemini-code-assist Bot May 5, 2026

Uh oh!

gemini-code-assist Bot May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	let mut results = table.query().limit(1_000_000).execute().await?;
	let mut results = table.query().execute().await?;

	with urllib.request.urlopen(request, timeout=300) as response:
	with urllib.request.urlopen(request, timeout=30) as response:

Conversation

welsberr commented May 5, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant