Skip to content

Commit bd830d2

Browse files
authored
Add lazy-loading and LRU cache for context-based lookups (#62)
"Headless" LSP clients such as Claude Code don't actually open files when doing lookups or changing them. This means that some of our tricks for looking things up via the context of the files breaks. This PR adds lazy disk-fallback to the document store so LSP clients that don't drive the full `didOpen`/`didChange`/`didClose` lifecycle (e.g. Claude Code) can still query definitions, hovers, references, and other text-dependent handlers. Use-injected lookups in particular: `lookupThroughUse`, `lookupThroughUseOf`. Alias merging also previously returned `nil` for any file the editor hadn't explicitly opened. The goal of these changes is enablement work for a proper Claude Code plugin: #59. ## What changed - **`DocumentStore.GetOrLoad(uri)`**: lazy disk read for any `file://` URI that isn't already in the store. Disk-loaded entries are marked `transient: true` and tracked in a Least Recently Used (LRU) cache. Non-`file://` URIs (e.g. `untitled:`) return `(false)` without touching disk to avoid the `uriToPath` panic. - **LRU eviction**: transient entries are capped (default 50). Editor-owned buffers (`Set` via `didOpen`) are never counted, never reordered, never evicted. `Set` cleanly promotes a transient entry to editor-owned, dropping its LRU bookkeeping. - **Cap is configurable**: new `maxTransientDocuments` initialization option (default 50, accepts integer or JSON number, clamps negatives to 0). Documented in `README.md`. - **Handler migration**: 18 read-only handlers (`Definition`, `Hover`, `References`, `Completion`, `CodeAction`, `Declaration`, `DocumentHighlight`, `DocumentSymbol`, `FoldingRanges`, `Implementation`, `PrepareRename`, `Rename`, `SignatureHelp`, `TypeDefinition`, `PrepareCallHierarchy`, …) now call `GetOrLoad`. `Formatting` deliberately keeps `Get` — it only makes sense for in-memory editor buffers. ## Correctness notes - File I/O happens outside the write lock; a re-check inside the lock returns the existing entry if a concurrent `Set` or `GetOrLoad` populated the URI first. - `bumpLRU` is a no-op if the URI was promoted to editor-owned between the RLock and Lock, so the fast-path race is safe. - `evictTransientLocked` closes the cached tree-sitter tree before deletion - no leak. - Race-clean under `go test -race`.
1 parent ecda0a8 commit bd830d2

4 files changed

Lines changed: 836 additions & 46 deletions

File tree

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -499,6 +499,7 @@ Dexter reads `initializationOptions` from your editor configuration:
499499
- **`followDelegates`** (boolean, default: `true`): follow `defdelegate` targets on lookup.
500500
- **`stdlibPath`** (string): override the Elixir stdlib directory to index. Defaults to auto-detection; use this if your install is non-standard.
501501
- **`debug`** (boolean, default: `false`): enable verbose logging to stderr. Logs timing and resolution details for every definition, hover, references, and rename request. Can also be enabled via the `DEXTER_DEBUG=true` environment variable.
502+
- **`maxTransientDocuments`** (integer, default: `50`): cap on how many lazily-loaded buffers the server retains in memory. When an LSP client (e.g. Claude Code) queries a file it never opened via `didOpen`, dexter reads it from disk and caches it. Editor-owned buffers are unaffected; only disk-loaded entries are subject to LRU eviction. Set to `0` to disable transient caching.
502503

503504
## Index database location (.dexter/)
504505

internal/lsp/documents.go

Lines changed: 248 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,70 +1,154 @@
11
package lsp
22

33
import (
4+
"container/list"
5+
"os"
6+
"strings"
47
"sync"
58

69
tree_sitter "github.com/tree-sitter/go-tree-sitter"
710
tree_sitter_elixir "github.com/tree-sitter/tree-sitter-elixir/bindings/go"
11+
"go.lsp.dev/protocol"
812

913
"github.com/remoteoss/dexter/internal/parser"
1014
)
1115

16+
// defaultMaxTransient caps how many disk-loaded buffers may live in the
17+
// store concurrently. Editor-owned buffers (added via Set) are never counted
18+
// against this cap.
19+
const defaultMaxTransient = 50
20+
1221
type cachedDoc struct {
1322
text string
14-
tree *tree_sitter.Tree
15-
src []byte // source bytes the tree references must stay alive
23+
tree *refTree
24+
src []byte // source bytes the tree references - must stay alive
1625
tokens []parser.Token // cached tokenizer output
1726
tokSrc []byte // source bytes for tokens
1827
lineStarts []int // byte offset of each line start (from TokenizeFull)
28+
// transient is true for entries loaded from disk via GetOrLoad - i.e.
29+
// no editor sent didOpen for this URI. These entries are tracked in an
30+
// LRU and evicted once the transient cap is reached. Editor-owned
31+
// entries (created via Set) are never transient and never evicted.
32+
transient bool
33+
}
34+
35+
// refTree wraps a tree-sitter parse tree with refcounting so that
36+
// concurrent handlers walking the tree (RootNode, queries) aren't racing
37+
// with eviction or replacement, which would free the underlying C memory
38+
// via ts_tree_delete and cause a use-after-free that the Go race detector
39+
// cannot observe.
40+
//
41+
// Lifecycle: GetTree increments refs under the store write lock; the
42+
// returned release closure decrements under the same lock and frees the
43+
// tree only once refs==0 AND retired is set. Set/Close/CloseAll/eviction
44+
// don't close the tree directly - they call retireLocked, which marks the
45+
// tree for free and only triggers ts_tree_delete if no handler still
46+
// holds a reference.
47+
type refTree struct {
48+
tree *tree_sitter.Tree
49+
refs int
50+
retired bool
51+
}
52+
53+
// retireLocked marks the tree for free. Caller must hold the store write
54+
// lock. If no handler is currently using the tree, frees it immediately;
55+
// otherwise the last release closes it.
56+
func (rt *refTree) retireLocked() {
57+
if rt == nil || rt.tree == nil {
58+
return
59+
}
60+
rt.retired = true
61+
if rt.refs == 0 {
62+
rt.tree.Close()
63+
rt.tree = nil
64+
}
1965
}
2066

2167
// DocumentStore tracks the text content of open buffers and caches
2268
// tree-sitter parse trees for each document. All access is serialized
2369
// through a single RWMutex: reads (Get) take RLock, writes and parsing
2470
// (Set, Close, GetTree) take Lock.
71+
//
72+
// In addition to editor-managed buffers (populated by Set on didOpen /
73+
// didChange), DocumentStore can lazily load buffers from disk via
74+
// GetOrLoad. Disk-loaded entries are marked transient and tracked in an
75+
// LRU list so that AI tools that don't drive a didOpen/didClose lifecycle
76+
// (e.g. Claude Code) can still query references/hover/definition without
77+
// causing unbounded memory growth.
2578
type DocumentStore struct {
2679
mu sync.RWMutex
2780
docs map[string]*cachedDoc
2881
parser *tree_sitter.Parser
82+
83+
// LRU bookkeeping for transient (disk-loaded) entries only. The list
84+
// holds URIs in access-order, newest at the front. transientIdx maps
85+
// URI → its list element for O(1) move/remove.
86+
transientList *list.List
87+
transientIdx map[string]*list.Element
88+
maxTransient int
2989
}
3090

3191
func NewDocumentStore() *DocumentStore {
3292
p := tree_sitter.NewParser()
3393
_ = p.SetLanguage(tree_sitter.NewLanguage(tree_sitter_elixir.Language()))
3494
return &DocumentStore{
35-
docs: make(map[string]*cachedDoc),
36-
parser: p,
95+
docs: make(map[string]*cachedDoc),
96+
parser: p,
97+
transientList: list.New(),
98+
transientIdx: make(map[string]*list.Element),
99+
maxTransient: defaultMaxTransient,
37100
}
38101
}
39102

103+
// SetMaxTransient updates the cap on disk-loaded (transient) entries and
104+
// evicts any excess immediately. A cap of 0 disables transient caching -
105+
// disk-loaded entries are inserted and immediately evicted, so the store
106+
// still serves the read but never retains it. Editor-owned entries are
107+
// never affected. Negative values are clamped to 0.
108+
func (ds *DocumentStore) SetMaxTransient(n int) {
109+
if n < 0 {
110+
n = 0
111+
}
112+
ds.mu.Lock()
113+
defer ds.mu.Unlock()
114+
ds.maxTransient = n
115+
ds.evictTransientLocked()
116+
}
117+
40118
func (ds *DocumentStore) Set(uri string, text string) {
41119
ds.mu.Lock()
42120
defer ds.mu.Unlock()
43-
if doc, ok := ds.docs[uri]; ok && doc.tree != nil {
44-
doc.tree.Close()
121+
if doc, ok := ds.docs[uri]; ok {
122+
doc.tree.retireLocked()
45123
}
124+
// Editor took ownership of this URI - drop any LRU tracking for it.
125+
ds.removeFromLRULocked(uri)
46126
ds.docs[uri] = &cachedDoc{text: text}
47127
}
48128

49129
func (ds *DocumentStore) Close(uri string) {
50130
ds.mu.Lock()
51131
defer ds.mu.Unlock()
52-
if doc, ok := ds.docs[uri]; ok && doc.tree != nil {
53-
doc.tree.Close()
132+
if doc, ok := ds.docs[uri]; ok {
133+
doc.tree.retireLocked()
54134
}
135+
ds.removeFromLRULocked(uri)
55136
delete(ds.docs, uri)
56137
}
57138

58-
// CloseAll frees all cached trees and the shared parser.
139+
// CloseAll frees all cached trees and the shared parser. Trees still
140+
// referenced by in-flight handlers stay alive until released; the parser
141+
// itself is safe to close immediately because parse trees are independent
142+
// of the parser once produced.
59143
func (ds *DocumentStore) CloseAll() {
60144
ds.mu.Lock()
61145
defer ds.mu.Unlock()
62146
for _, doc := range ds.docs {
63-
if doc.tree != nil {
64-
doc.tree.Close()
65-
}
147+
doc.tree.retireLocked()
66148
}
67149
ds.docs = nil
150+
ds.transientList = nil
151+
ds.transientIdx = nil
68152
ds.parser.Close()
69153
}
70154

@@ -78,21 +162,167 @@ func (ds *DocumentStore) Get(uri string) (string, bool) {
78162
return doc.text, true
79163
}
80164

165+
// GetIfOpen returns the text for the given URI, but only if the entry is
166+
// editor-owned (non-transient) — i.e. the editor sent a didOpen. Returns
167+
// ("", false) for transient entries loaded via GetOrLoad and for URIs
168+
// that are not in the store at all. This is an atomic single-lock check
169+
// distinct from calling HasOpen followed by Get, which would be a
170+
// TOCTOU race if Close interleaves between the two RLock acquisitions.
171+
func (ds *DocumentStore) GetIfOpen(uri string) (string, bool) {
172+
ds.mu.RLock()
173+
defer ds.mu.RUnlock()
174+
doc, ok := ds.docs[uri]
175+
if !ok || doc.transient {
176+
return "", false
177+
}
178+
return doc.text, true
179+
}
180+
181+
// GetOrLoad returns the text for the given URI, falling back to a disk
182+
// read if no editor has opened the document. Disk-loaded entries are
183+
// marked transient and tracked in an LRU; if the transient population
184+
// exceeds the cap, the least-recently-used transient entry is evicted.
185+
//
186+
// Returns ("", false) if the URI does not resolve to a readable file on
187+
// disk (e.g. non-file:// URIs, missing files, permission errors).
188+
//
189+
// Editor-owned entries (added via Set) are never evicted and are not
190+
// reordered in the LRU - only transient entries participate.
191+
func (ds *DocumentStore) GetOrLoad(uri string) (string, bool) {
192+
// Fast path: lookup under RLock. We avoid the LRU bump here so
193+
// repeated hits on editor-owned buffers don't contend on the write
194+
// lock at all.
195+
ds.mu.RLock()
196+
if doc, ok := ds.docs[uri]; ok {
197+
text := doc.text
198+
isTransient := doc.transient
199+
ds.mu.RUnlock()
200+
if isTransient {
201+
ds.bumpLRU(uri)
202+
}
203+
return text, true
204+
}
205+
ds.mu.RUnlock()
206+
207+
// Miss: read from disk *outside* the write lock so concurrent
208+
// requests for other URIs aren't blocked behind file I/O. We only
209+
// fall back to disk for file:// URIs - uri.Filename() panics on
210+
// other schemes (e.g. untitled:), so guard before calling it.
211+
if !strings.HasPrefix(uri, "file://") {
212+
return "", false
213+
}
214+
path := uriToPath(protocol.DocumentURI(uri))
215+
if path == "" {
216+
return "", false
217+
}
218+
data, err := os.ReadFile(path)
219+
if err != nil {
220+
return "", false
221+
}
222+
text := string(data)
223+
224+
ds.mu.Lock()
225+
defer ds.mu.Unlock()
226+
227+
// Re-check: another goroutine may have populated this URI (via Set
228+
// or a concurrent GetOrLoad) while we were reading from disk. If so,
229+
// prefer the existing entry - Set wins by definition; a concurrent
230+
// transient load is equivalent to ours. Bump the LRU on the way out
231+
// so this access registers as recency-of-use, matching the fast-path
232+
// behavior above; without this, racing slow-path callers wouldn't
233+
// keep the entry warm even though they just used it.
234+
if existing, ok := ds.docs[uri]; ok {
235+
if existing.transient {
236+
if elem, ok := ds.transientIdx[uri]; ok {
237+
ds.transientList.MoveToFront(elem)
238+
}
239+
}
240+
return existing.text, true
241+
}
242+
243+
ds.docs[uri] = &cachedDoc{text: text, transient: true}
244+
ds.transientIdx[uri] = ds.transientList.PushFront(uri)
245+
ds.evictTransientLocked()
246+
return text, true
247+
}
248+
249+
// bumpLRU moves a transient URI to the front of the LRU list. Called on
250+
// every hit against a transient entry so the eviction order tracks
251+
// recency-of-use rather than recency-of-load.
252+
func (ds *DocumentStore) bumpLRU(uri string) {
253+
ds.mu.Lock()
254+
defer ds.mu.Unlock()
255+
if elem, ok := ds.transientIdx[uri]; ok {
256+
ds.transientList.MoveToFront(elem)
257+
}
258+
}
259+
260+
// removeFromLRULocked removes a URI from LRU tracking. Caller must hold
261+
// the write lock. Safe to call for URIs that aren't tracked.
262+
func (ds *DocumentStore) removeFromLRULocked(uri string) {
263+
if elem, ok := ds.transientIdx[uri]; ok {
264+
ds.transientList.Remove(elem)
265+
delete(ds.transientIdx, uri)
266+
}
267+
}
268+
269+
// evictTransientLocked drops the least-recently-used transient entry
270+
// while the transient population exceeds the cap. Caller must hold the
271+
// write lock.
272+
func (ds *DocumentStore) evictTransientLocked() {
273+
for ds.transientList.Len() > ds.maxTransient {
274+
elem := ds.transientList.Back()
275+
if elem == nil {
276+
return
277+
}
278+
victim := elem.Value.(string)
279+
ds.transientList.Remove(elem)
280+
delete(ds.transientIdx, victim)
281+
if doc, ok := ds.docs[victim]; ok {
282+
doc.tree.retireLocked()
283+
delete(ds.docs, victim)
284+
}
285+
}
286+
}
287+
81288
// GetTree returns a cached tree-sitter parse tree and its source bytes for
82-
// the given URI. Parses on first access and caches the result. The tree is
83-
// invalidated on the next Set() call. Callers must not close the returned tree.
84-
func (ds *DocumentStore) GetTree(uri string) (*tree_sitter.Tree, []byte, bool) {
289+
// the given URI. Parses on first access and caches the result.
290+
//
291+
// The returned release closure MUST be called exactly once when the caller
292+
// is done with the tree (typically via defer). It increments the tree's
293+
// refcount under the store lock for the duration of the caller's use,
294+
// keeping the underlying C memory alive even if Set/Close/CloseAll or LRU
295+
// eviction concurrently retires the tree. Without this, a concurrent
296+
// GetOrLoad for one URI could evict and free the tree of another URI
297+
// while a handler is still walking it - a use-after-free on C memory that
298+
// the Go race detector cannot catch.
299+
//
300+
// Callers must not close the returned tree directly.
301+
//
302+
// When ok is false, release is nil and must not be called.
303+
func (ds *DocumentStore) GetTree(uri string) (*tree_sitter.Tree, []byte, func(), bool) {
85304
ds.mu.Lock()
86305
defer ds.mu.Unlock()
87306
doc, ok := ds.docs[uri]
88307
if !ok {
89-
return nil, nil, false
308+
return nil, nil, nil, false
90309
}
91310
if doc.tree == nil {
92311
doc.src = []byte(doc.text)
93-
doc.tree = ds.parser.Parse(doc.src, nil)
312+
doc.tree = &refTree{tree: ds.parser.Parse(doc.src, nil)}
313+
}
314+
rt := doc.tree
315+
rt.refs++
316+
release := func() {
317+
ds.mu.Lock()
318+
defer ds.mu.Unlock()
319+
rt.refs--
320+
if rt.refs == 0 && rt.retired && rt.tree != nil {
321+
rt.tree.Close()
322+
rt.tree = nil
323+
}
94324
}
95-
return doc.tree, doc.src, true
325+
return rt.tree, doc.src, release, true
96326
}
97327

98328
// GetTokens returns cached tokenizer output and source bytes for the given URI.

0 commit comments

Comments
 (0)