Skip to content

Commit 9012314

Browse files
committed
Phase 115: Complete documentation and defer Phase 115.3
**Phase 115 Summary**: Phases 115.1-115.2 complete, Phase 115.3 deferred **Completed Work**: - Phase 115.1: String operations (7 files, 40+ sites) - Dual-API pattern: pointer-primary for performance - Commits: 0aa81ee, 08c8774 - Phase 115.2: Proto span accessors (2 files, 23 sites) - ldebug.cpp: 8 conversions - lundump.cpp: 15 conversions - Commits: 6f830e7, 943a3ef **Performance Results**: - Current: 4.70s avg (range: 4.56s-4.87s) - Target: ≤4.33s (3% tolerance from 4.20s baseline) - Regression: 11.9% above 4.20s baseline - Status: ⚠️ Above target, needs investigation **Phase 115.3 Decision**: - Objective: Add Table::getArraySpan() accessor - Status: DEFERRED due to performance concerns - Rationale: Already 11.9% above baseline; Phase 115.3 marked as "optional, if no regression"; MEDIUM RISK assessment makes proceeding inadvisable until performance improves **Documentation**: docs/PHASE_115_SUMMARY.md - Complete analysis of all three phases - Performance breakdown and variance analysis - Lessons learned (dual-API pattern, measurement importance) - Recommendations for performance recovery - Statistics: 9 files, 60+ sites, 4 commits **Benefits Achieved**: ✅ Type safety: Size in span type, bounds checking in debug ✅ Modern C++: Range-based for loops (13 sites) ✅ Maintainability: Reduced pointer arithmetic (23 sites) ✅ C API compatibility: Dual-API pattern maintains ABI ✅ All tests passing: "final OK !!!" **Challenges**: ⚠️ 11.9% performance regression (vs 3% tolerance) ⚠️ Phase 115.2 added unexpected 3.7% overhead ⚠️ "Zero-cost" abstractions showed measurable cost **Recommendations**: 1. Profile to identify regression source (hot paths) 2. Consider selective reversion if specific conversions problematic 3. Investigate compiler optimization barriers 4. Revisit Phase 115.3 after performance recovery **Next Steps**: - Investigate performance regression - Consider optimization strategies (Option A-D in summary) - Defer additional span adoption until ≤4.33s achieved
1 parent 943a3ef commit 9012314

File tree

1 file changed

+394
-0
lines changed

1 file changed

+394
-0
lines changed

docs/PHASE_115_SUMMARY.md

Lines changed: 394 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,394 @@
1+
# Phase 115: std::span Adoption - Summary
2+
3+
**Date**: November 21, 2025
4+
**Status**: Phases 115.1-115.2 Complete, Phase 115.3 Deferred
5+
**Performance**: ⚠️ 4.70s avg (11.9% regression from 4.20s baseline)
6+
7+
---
8+
9+
## Overview
10+
11+
Phase 115 focused on adopting `std::span` to improve type safety and code clarity throughout the Lua C++ codebase. This phase built upon Phase 112's addition of span accessors to Proto and ProtoDebugInfo.
12+
13+
### Objectives
14+
- ✅ Add std::span support for string operations (Phase 115.1)
15+
- ✅ Use existing Proto span accessors throughout codebase (Phase 115.2)
16+
- ⏸️ Add Table::getArraySpan() accessor (Phase 115.3 - DEFERRED)
17+
18+
---
19+
20+
## Phase 115.1: String Operations (COMPLETED)
21+
22+
### Files Modified: 7 files, 40+ sites
23+
24+
**Core String Functions** (lstring.h/cpp):
25+
- `luaS_hash()` - String hashing
26+
- `luaS_newlstr()` - String creation
27+
- `internshrstr()` - String interning (internal)
28+
29+
**String Utilities** (lobject.h/cpp):
30+
- `luaO_chunkid()` - Chunk ID formatting
31+
- `addstr2buff()` - Buffer string operations
32+
- `addstr()` - String append helper
33+
34+
**Pattern Matching** (lstrlib.cpp):
35+
- `lmemfind()` - Memory search
36+
- `nospecials()` - Pattern check (now returns `bool`!)
37+
- `prepstate()` - Match state preparation
38+
39+
**Buffer Operations** (lauxlib.h/cpp):
40+
- `luaL_addlstring()` - Buffer string addition
41+
42+
### Architecture: Dual-API Pattern
43+
44+
**Initial Approach** (span-primary): 17% regression (4.91s)
45+
```cpp
46+
// Primary implementation used span
47+
LUAI_FUNC unsigned luaS_hash(std::span<const char> str, unsigned seed);
48+
49+
// Wrapper for C compatibility
50+
inline unsigned luaS_hash(const char *str, size_t l, unsigned seed) {
51+
return luaS_hash(std::span(str, l), seed);
52+
}
53+
```
54+
55+
**Optimized Approach** (pointer-primary): 8% improvement
56+
```cpp
57+
// Primary implementation uses pointer+size for hot paths
58+
LUAI_FUNC unsigned luaS_hash(const char *str, size_t l, unsigned seed);
59+
60+
// Convenience overload for new code
61+
inline unsigned luaS_hash(std::span<const char> str, unsigned seed) {
62+
return luaS_hash(str.data(), str.size(), seed);
63+
}
64+
```
65+
66+
**Key Insight**: Hot paths must avoid unnecessary span construction. The dual-API pattern provides:
67+
- ✅ Zero overhead for existing code paths
68+
- ✅ Span convenience for new code
69+
- ✅ C API compatibility through function overloading
70+
- ✅ Gradual adoption without forcing conversions
71+
72+
### Performance Impact
73+
74+
**Initial**: 4.91s avg (17% regression)
75+
**After optimization**: 4.53s avg (7.8% regression)
76+
**Best individual runs**: 4.05s, 4.06s (better than 4.20s baseline!)
77+
78+
**Commits**:
79+
- `0aa81ee` - Initial span adoption
80+
- `08c8774` - Optimization (pointer-primary pattern)
81+
82+
---
83+
84+
## Phase 115.2: Proto Span Accessors (COMPLETED)
85+
86+
### Files Modified: 2 files, 23 sites
87+
88+
**ldebug.cpp** (8 conversions):
89+
- `getbaseline()``getAbsLineInfoSpan()` with `std::upper_bound`
90+
- `luaG_getfuncline()``getLineInfoSpan()`
91+
- `nextline()``getLineInfoSpan()`
92+
- `collectvalidlines()``getLineInfoSpan()` + `getCodeSpan()`
93+
- `changedline()``getLineInfoSpan()`
94+
95+
**lundump.cpp** (15 conversions):
96+
- `loadCode()``getCodeSpan()`
97+
- `loadConstants()``getConstantsSpan()` with range-based for
98+
- `loadUpvalues()``getUpvaluesSpan()` with range-based for
99+
- `loadDebug()``getLineInfoSpan()`, `getAbsLineInfoSpan()`, `getLocVarsSpan()`
100+
101+
### Code Examples
102+
103+
**Before** (ldebug.cpp):
104+
```cpp
105+
static int getbaseline (const Proto *f, int pc, int *basepc) {
106+
if (f->getAbsLineInfoSize() == 0 || pc < f->getAbsLineInfo()[0].getPC()) {
107+
*basepc = -1;
108+
return f->getLineDefined();
109+
}
110+
const AbsLineInfo* absLineInfo = f->getAbsLineInfo();
111+
int size = f->getAbsLineInfoSize();
112+
auto it = std::upper_bound(absLineInfo, absLineInfo + size, pc, ...);
113+
// ...
114+
}
115+
```
116+
117+
**After**:
118+
```cpp
119+
static int getbaseline (const Proto *f, int pc, int *basepc) {
120+
auto absLineInfoSpan = f->getDebugInfo().getAbsLineInfoSpan();
121+
if (absLineInfoSpan.empty() || pc < absLineInfoSpan[0].getPC()) {
122+
*basepc = -1;
123+
return f->getLineDefined();
124+
}
125+
auto it = std::upper_bound(absLineInfoSpan.begin(), absLineInfoSpan.end(), pc, ...);
126+
// ...
127+
}
128+
```
129+
130+
**Benefits**:
131+
- No separate size variable needed
132+
- Bounds checking in debug builds
133+
- Standard algorithms work naturally with span iterators
134+
- Clearer intent (array view, not raw pointer manipulation)
135+
136+
### Performance Impact
137+
138+
**After Phase 115.2**: 4.70s avg (11.9% regression from 4.20s baseline)
139+
140+
**Commits**:
141+
- `6f830e7` - ldebug.cpp conversions
142+
- `943a3ef` - lundump.cpp conversions
143+
144+
---
145+
146+
## Phase 115.3: Table Arrays (DEFERRED)
147+
148+
### Reason for Deferral
149+
150+
Performance after Phases 115.1-115.2:
151+
- **Current**: 4.70s avg (range: 4.56s-4.87s)
152+
- **Target**: ≤4.33s (3% tolerance from 4.20s baseline)
153+
- **Regression**: 11.9% above baseline
154+
155+
**Decision**: Phase 115.3 was marked as "optional, if no regression". Given the current 11.9% regression, proceeding with Table array conversions (marked as MEDIUM RISK in the analysis) is not advisable.
156+
157+
### What Phase 115.3 Would Have Done
158+
159+
**Proposed**:
160+
```cpp
161+
class Table {
162+
public:
163+
std::span<Value> getArraySpan() noexcept {
164+
return std::span(array, asize);
165+
}
166+
std::span<const Value> getArraySpan() const noexcept {
167+
return std::span(array, asize);
168+
}
169+
};
170+
171+
// Usage
172+
for (Value& slot : t->getArraySpan()) {
173+
// Safer iteration, prevents off-by-one errors
174+
}
175+
```
176+
177+
**Estimated Impact**: 10-15 array iteration loops in ltable.cpp, lvm_table.cpp
178+
179+
**Risk Assessment**: MEDIUM - Table operations are performance-sensitive, and we're already above target.
180+
181+
---
182+
183+
## Performance Analysis
184+
185+
### Benchmark Results (5-run average)
186+
187+
| Phase | Avg Time | Min | Max | Variance | vs Baseline |
188+
|-------|----------|-----|-----|----------|-------------|
189+
| Baseline (4.20s) | 4.20s | - | - | - | 0% |
190+
| After 115.1 (initial) | 4.91s | - | - | - | +17% |
191+
| After 115.1 (optimized) | 4.53s | 4.05s | 4.98s | ~1s | +7.8% |
192+
| After 115.2 | 4.70s | 4.56s | 4.87s | 0.31s | +11.9% |
193+
194+
### Performance Observations
195+
196+
1. **High Variance**: 0.31s-1s spread suggests system load factors
197+
2. **Best Individual Runs**: 4.05s, 4.06s beat the 4.20s baseline
198+
3. **Span Construction Overhead**: Initial 17% regression demonstrated that span construction in hot paths is costly
199+
4. **Pointer-Primary Pattern**: Reduced regression from 17% to 7.8% (8% improvement)
200+
5. **Phase 115.2 Impact**: 4.53s → 4.70s (3.7% degradation)
201+
202+
### Root Cause Investigation Needed
203+
204+
**Potential Issues**:
205+
1. ❓ **Compiler optimization barriers**: Are spans preventing optimizations?
206+
2. ❓ **Debug info overhead**: .data() calls on spans might not fully optimize
207+
3. ❓ **System variance**: Wide ranges suggest external factors
208+
4. ❓ **Test methodology**: Single-run vs multi-run averages
209+
210+
**Recommendation**: Before proceeding with Phase 115.3 or additional span adoption:
211+
1. Investigate why Phase 115.2 added 3.7% overhead (should be zero-cost)
212+
2. Profile hot paths to identify bottlenecks
213+
3. Consider selective reversion if specific conversions are problematic
214+
4. Validate with multiple benchmark runs under controlled conditions
215+
216+
---
217+
218+
## Benefits Achieved
219+
220+
Despite performance concerns, Phase 115 delivered significant code quality improvements:
221+
222+
### Type Safety
223+
✅ Size information included in span type
224+
✅ Compile-time detection of size mismatches
225+
✅ Bounds checking in debug builds (`-D_GLIBCXX_DEBUG`)
226+
227+
### Modern C++ Idioms
228+
✅ Range-based for loops (13 sites converted)
229+
✅ Standard algorithms work with span iterators
230+
✅ Cleaner interfaces (no separate pointer+size parameters)
231+
232+
### Maintainability
233+
✅ Reduced pointer arithmetic (23 sites)
234+
✅ Clearer intent (array views vs raw pointers)
235+
✅ Fewer magic size variables
236+
237+
### Code Examples
238+
239+
**Range-based for** (lundump.cpp):
240+
```cpp
241+
// Before
242+
std::for_each_n(f->getConstants(), n, [](TValue& v) {
243+
setnilvalue(&v);
244+
});
245+
246+
// After
247+
auto constantsSpan = f->getConstantsSpan();
248+
for (TValue& v : constantsSpan) {
249+
setnilvalue(&v);
250+
}
251+
```
252+
253+
**Eliminated separate size** (ldebug.cpp):
254+
```cpp
255+
// Before
256+
const AbsLineInfo* absLineInfo = f->getAbsLineInfo();
257+
int size = f->getAbsLineInfoSize();
258+
auto it = std::upper_bound(absLineInfo, absLineInfo + size, pc, ...);
259+
260+
// After
261+
auto absLineInfoSpan = f->getDebugInfo().getAbsLineInfoSpan();
262+
auto it = std::upper_bound(absLineInfoSpan.begin(), absLineInfoSpan.end(), pc, ...);
263+
```
264+
265+
---
266+
267+
## Lessons Learned
268+
269+
### 1. Zero-Cost Abstraction Isn't Always Zero-Cost
270+
271+
**Expected**: std::span should compile to identical code as pointer+size
272+
**Reality**: Span construction in hot paths added measurable overhead
273+
**Solution**: Dual-API pattern keeps hot paths fast while enabling span where convenient
274+
275+
### 2. Measurement is Critical
276+
277+
- Initial span-primary approach: 17% regression (would have failed)
278+
- Pointer-primary optimization: Reduced to 7.8% regression
279+
- Continuous benchmarking caught issues early
280+
281+
### 3. Gradual Adoption Works Better
282+
283+
- Don't force existing code to use spans
284+
- Provide span overloads for new code
285+
- Let natural code evolution adopt spans where beneficial
286+
- Keep performance-critical paths unchanged
287+
288+
### 4. Profile Before Proceeding
289+
290+
Phase 115.2 added unexpected 3.7% overhead. Before continuing:
291+
- Need to understand why "zero-cost" abstractions have cost
292+
- Should profile to identify specific hot spots
293+
- May need to be more selective about conversions
294+
295+
---
296+
297+
## Recommendations
298+
299+
### Immediate Actions
300+
301+
1. **✅ COMPLETE Phases 115.1-115.2**: Code is functional, tests pass
302+
2. **⏸️ DEFER Phase 115.3**: Don't add Table spans until performance improves
303+
3. **📊 INVESTIGATE**: Profile to understand 11.9% regression source
304+
4. **🎯 OPTIMIZE**: Target bringing performance back to ≤4.33s
305+
306+
### Performance Recovery Options
307+
308+
**Option A: Accept Current State**
309+
- 11.9% regression is significant but code is more maintainable
310+
- May be acceptable trade-off for type safety benefits
311+
- Document as known issue, revisit if critical
312+
313+
**Option B: Selective Reversion**
314+
- Profile to find hot spots
315+
- Revert specific span conversions if they're bottlenecks
316+
- Keep spans in cold paths (debug info, serialization)
317+
318+
**Option C: Compiler Investigation**
319+
- Try different optimization flags (`-O3`, `-flto`, `-march=native`)
320+
- Check if specific GCC/Clang versions optimize spans better
321+
- Investigate PGO (Profile-Guided Optimization)
322+
323+
**Option D: Further Optimization**
324+
- Review lundump.cpp conversions (added 3.7% overhead)
325+
- Consider caching spans instead of recreating
326+
- Ensure `inline` hints are respected
327+
328+
### Future Work
329+
330+
**If Performance Improves**:
331+
- ✅ Proceed with Phase 115.3 (Table arrays)
332+
- ✅ Convert remaining Proto accessor usage
333+
- ✅ Add spans to other array-based structures
334+
335+
**Additional Opportunities**:
336+
- LuaStack range operations (analysis identified ~20 sites)
337+
- Compiler array operations (low priority, cold path)
338+
- Debug info iteration (low priority, rarely executed)
339+
340+
---
341+
342+
## Statistics
343+
344+
### Phase 115.1
345+
- **Files Modified**: 7
346+
- **Conversions**: 40+ sites
347+
- **Commits**: 2 (0aa81ee, 08c8774)
348+
349+
### Phase 115.2
350+
- **Files Modified**: 2 (ldebug.cpp, lundump.cpp)
351+
- **Conversions**: 23 sites (8 + 15)
352+
- **Commits**: 2 (6f830e7, 943a3ef)
353+
354+
### Total Phase 115
355+
- **Files Modified**: 9 unique files
356+
- **Conversions**: 60+ sites
357+
- **Commits**: 4
358+
- **Lines Changed**: ~220 insertions, ~180 deletions
359+
- **Performance**: 4.70s avg (target: ≤4.33s, baseline: 4.20s)
360+
- **Test Status**: ✅ All tests passing ("final OK !!!")
361+
362+
---
363+
364+
## Conclusion
365+
366+
Phase 115 successfully demonstrated std::span adoption in a C++ modernization context:
367+
368+
**Achievements**:
369+
✅ Established dual-API pattern for zero-overhead span support
370+
✅ Converted 60+ sites to use spans without breaking C API
371+
✅ Improved type safety and code clarity
372+
✅ Maintained test compatibility (all tests passing)
373+
374+
**Challenges**:
375+
⚠️ 11.9% performance regression (above 3% tolerance)
376+
⚠️ "Zero-cost" abstractions showed measurable cost
377+
⚠️ High variance suggests system factors or measurement issues
378+
379+
**Status**:
380+
- Phases 115.1-115.2: **COMPLETE**
381+
- Phase 115.3: **DEFERRED** pending performance investigation
382+
383+
**Next Steps**:
384+
1. Profile to understand regression source
385+
2. Consider selective optimizations or reversions
386+
3. Document performance findings
387+
4. Revisit Phase 115.3 when performance is within tolerance
388+
389+
---
390+
391+
**Related Documents**:
392+
- Initial Analysis: Exploration agent output (Phase 115 planning)
393+
- Phase 112: Proto span accessor additions (foundation work)
394+
- CLAUDE.md: Project overview and guidelines

0 commit comments

Comments
 (0)