Add web search UI#4
Conversation
There was a problem hiding this comment.
Code Review
This pull request significantly expands the Wolfe search tool by introducing a local web UI, a persistent embedding service, and a comprehensive corpus enrichment pipeline for extracting bibliographic metadata, references, and concept phrases. It also adds support for remote OpenAI-compatible embedding endpoints, a lexical match context search mode, and improved CUDA device management. Feedback focused on removing a hardcoded row limit in index retrieval to prevent data truncation, implementing atomic file writes for metadata catalogs to ensure reliability, and reducing the network timeout for remote embedding requests.
| let table = open_table(connection, table_name) | ||
| .await | ||
| .ok_or("search table does not exist")?; | ||
| let mut results = table.query().limit(1_000_000).execute().await?; |
There was a problem hiding this comment.
| fs::create_dir_all(parent)?; | ||
| } | ||
| } | ||
| let catalog_file = fs::File::create(&args.metadata_catalog)?; |
|
|
||
| request = urllib.request.Request(embedding_url, data=payload, headers=headers, method="POST") | ||
| try: | ||
| with urllib.request.urlopen(request, timeout=300) as response: |
There was a problem hiding this comment.
A 300-second timeout for an HTTP request is quite long and may cause the process to hang if the embedding service is unresponsive. Consider a shorter timeout.
| with urllib.request.urlopen(request, timeout=300) as response: | |
| with urllib.request.urlopen(request, timeout=30) as response: |
A web search UI for Wolfe.
Establishes a locally-served page. Search phrase input, number of top matches, submit search button. Matches display by filename, and can also incorporate metadata (additional scripts for the metadata analysis). The search phrase is processed with local Jina for the embedding.
Also added a 'context' match function that provides a flattened text result with N prior and succeeding chunks past matches.