-
Notifications
You must be signed in to change notification settings - Fork 0
Postgres Stub layer
The pg_ext library provides a minimal reimplementation of PostgreSQL internal APIs, enabling Springtail to execute functions from PostgreSQL extensions without requiring a full PostgreSQL backend. It creates a compatible runtime environment for extension code that expects PostgreSQL's data types, memory management, and function call interface.
Location: Source files in src/pg_ext/, Headers in include/pg_ext/
Purpose:
- Provide PostgreSQL-compatible data structures (Datum, text, arrays, etc.)
- Implement function call interface (
DirectFunctionCall*()) - Emulate PostgreSQL memory context system
- Support extension-defined types and operators
Source files (src/pg_ext/): Header files (include/pg_ext/):
├── fmgr.cc ├── fmgr.hh
├── memory.cc ├── memory.hh
├── string.cc ├── string.hh
├── array.cc ├── array.hh
├── numeric.cc ├── numeric.hh
├── date.cc ├── date.hh
├── jsonb.cc ├── jsonb.hh
├── error.cc ├── error.hh
├── node.cc ├── node.hh
├── hash.cc ├── hash.hh
├── list.cc ├── list.hh
├── parser.cc ├── parser.hh
├── heaptuple.cc ├── heaptuple.hh
├── pqformat.cc ├── pqformat.hh
├── bit.cc ├── bit.hh
├── float.cc ├── float.hh
├── guc.cc ├── guc.hh
├── extn_registry.cc ├── extn_registry.hh
└── extn_parser.cc └── extn_parser.hh
Components:
- fmgr: Function manager (DirectFunctionCall* wrappers)
- memory: Memory contexts (palloc, pfree, TopMemoryContext)
- string: Text type and string utilities
- array: PostgreSQL array handling
- numeric: Numeric type support
- date: Date/time types
- jsonb: JSONB type support
- error: Error reporting (ereport, elog)
- node: PostgreSQL node types
- hash: Hash functions
- list: PostgreSQL list structures
- parser: libpg_query integration
- heaptuple: Heap tuple representation
- pqformat: Wire protocol formatting
- bit: Bit string operations
- float: Float type utilities
- guc: GUC (configuration) stubs
- extn_registry: Extension registry (covered separately)
- extn_parser: Extension SQL parsing
---
## Function Manager (fmgr)
### Overview
Provides macros and functions for calling PostgreSQL functions with a compatible calling convention.
**Files:** `src/pg_ext/fmgr.cc`, `include/pg_ext/fmgr.hh`
### Key Macros
#### LOCAL_FCINFO()
Allocates stack-local FunctionCallInfo structure:
```cpp
#define LOCAL_FCINFO(name, nargs) \
FunctionCallInfoBaseData name##data; \
FunctionCallInfo name = &name##data
Usage:
LOCAL_FCINFO(fcinfo, 2); // Allocate fcinfo for 2 argumentsInitialize FunctionCallInfo structure:
void InitFunctionCallInfoData(FunctionCallInfo fcinfo,
FmgrInfo* flinfo,
int nargs,
Oid collation,
void* context,
void* resultinfo);Convenience functions for calling PostgreSQL functions:
Datum DirectFunctionCall1(PGFunction func, Datum arg1);Implementation:
Datum DirectFunctionCall1(PGFunction func, Datum arg1) {
LOCAL_FCINFO(fcinfo, 1);
InitFunctionCallInfoData(*fcinfo, nullptr, 1, 0, nullptr, nullptr);
fcinfo->args[0].value = arg1;
fcinfo->args[0].isnull = false;
return func(fcinfo);
}Example:
// Call int4in("12345")
PGFunction int4in_func = (PGFunction)get_type_func("int4in");
Datum result = DirectFunctionCall1(int4in_func, CStringGetDatum("12345"));
int32_t value = DatumGetInt32(result); // value = 12345Datum DirectFunctionCall2(PGFunction func, Datum arg1, Datum arg2);Example:
// Call int4add(42, 10)
Datum result = DirectFunctionCall2(int4add_func,
Int32GetDatum(42),
Int32GetDatum(10));
// result = 52Datum DirectFunctionCall3(PGFunction func, Datum arg1, Datum arg2, Datum arg3);For functions that require collation (string operations):
Datum DirectFunctionCall1Coll(PGFunction func, Oid collation, Datum arg1);
Datum DirectFunctionCall2Coll(PGFunction func, Oid collation, Datum arg1, Datum arg2);
Datum DirectFunctionCall3Coll(PGFunction func, Oid collation, Datum arg1, Datum arg2, Datum arg3);
Datum DirectFunctionCall7Coll(PGFunction func, Oid collation, ...); // For GIN extractQueryExample:
// Case-insensitive text comparison
Datum result = DirectFunctionCall2Coll(texteq_func,
DEFAULT_COLLATION_OID,
text1_datum,
text2_datum);Macros for converting between C types and Datum:
// Integer conversions
#define Int32GetDatum(x) ((Datum)(x))
#define DatumGetInt32(x) ((int32_t)(x))
#define Int64GetDatum(x) ((Datum)(x))
#define DatumGetInt64(x) ((int64_t)(x))
// Pointer conversions
#define PointerGetDatum(x) ((Datum)(x))
#define DatumGetPointer(x) ((void*)(x))
// Boolean conversions
#define BoolGetDatum(x) ((Datum)((x) ? 1 : 0))
#define DatumGetBool(x) ((bool)(x))
// Float conversions
#define Float4GetDatum(x) /* implementation */
#define DatumGetFloat4(x) /* implementation */
#define Float8GetDatum(x) /* implementation */
#define DatumGetFloat8(x) /* implementation */
// String conversions
#define CStringGetDatum(x) PointerGetDatum(x)
#define DatumGetCString(x) ((char*)DatumGetPointer(x))
// Object ID
#define ObjectIdGetDatum(x) ((Datum)(x))
#define DatumGetObjectId(x) ((Oid)(x))PostgreSQL uses a memory context system for allocation tracking and bulk deallocation. pg_ext provides a simplified implementation.
Files: src/pg_ext/memory.cc, include/pg_ext/memory.hh
class MemoryContext {
public:
MemoryContext(MemoryContext* parent,
std::string_view name,
size_t init_size,
size_t max_size);
void* alloc(size_t size);
void free(void* ptr);
void reset(); // Free all allocations
void delete_context();
private:
std::string _name;
size_t _init_size;
size_t _max_size;
MemoryContext* _parent;
std::map<size_t, std::unique_ptr<MemoryBlock>> _blocks;
std::unordered_map<char*, std::unique_ptr<MemoryBlock>> _large_allocs;
};extern MemoryContext TopMemoryContext;The global TopMemoryContext is the root of all memory contexts:
// In memory.cc
MemoryContext TopMemoryContext(nullptr, "TopMemoryContext", 8192, 1048576);Allocate memory from a memory context:
void* palloc(size_t size);
void* palloc0(size_t size); // Zero-initializedImplementation:
void* palloc(size_t size) {
return TopMemoryContext.alloc(size);
}
void* palloc0(size_t size) {
void* ptr = palloc(size);
memset(ptr, 0, size);
return ptr;
}Example:
// Allocate 1024 bytes
char* buffer = (char*)palloc(1024);
// Use buffer...
pfree(buffer);Free memory allocated by palloc:
void pfree(void* ptr);Reallocate memory:
void* repalloc(void* ptr, size_t size);Internal structure for memory management:
struct MemoryBlock {
size_t size; // Total size of block
size_t pos; // Current position
char* memory; // Allocated memory
MemoryBlock(size_t size);
size_t remaining() const { return size - pos; }
};- Extension functions expect
palloc()for allocations - Memory contexts are simplified compared to PostgreSQL
- No support for memory context switching or hierarchies
- All allocations go to
TopMemoryContext
PostgreSQL's text type is a variable-length binary string with a 4-byte header.
Files: src/pg_ext/string.cc, include/pg_ext/string.hh
struct varlena {
int32_t vl_len_; // Length including header (bit 0 = compressed flag)
char vl_dat[1]; // Flexible array member
};
typedef struct varlena text;Convert C string to PostgreSQL text:
text* cstring_to_text(const char* str);
text* cstring_to_text_with_len(const char* str, int len);Implementation:
text* cstring_to_text(const char* str) {
int len = strlen(str);
return cstring_to_text_with_len(str, len);
}
text* cstring_to_text_with_len(const char* str, int len) {
text* result = (text*)palloc(len + VARHDRSZ);
SET_VARSIZE(result, len + VARHDRSZ);
memcpy(VARDATA(result), str, len);
return result;
}Example:
text* my_text = cstring_to_text("Hello, World!");Wrapper that handles const correctness:
text* cstring_to_text_auto(const char* str);Convert text to C string:
char* text_to_cstring(const text* t);Implementation:
char* text_to_cstring(const text* t) {
int len = VARSIZE_ANY_EXHDR(t);
char* result = (char*)palloc(len + 1);
memcpy(result, VARDATA_ANY(t), len);
result[len] = '\0';
return result;
}Example:
char* cstr = text_to_cstring(my_text);
printf("%s\n", cstr);
pfree(cstr);// Get size of varlena (excluding header)
#define VARSIZE_ANY_EXHDR(ptr) (VARSIZE_ANY(ptr) - VARHDRSZ)
// Get data pointer
#define VARDATA_ANY(ptr) ((char*)(ptr) + VARHDRSZ)
#define VARDATA(ptr) VARDATA_ANY(ptr)
// Set size (including header)
#define SET_VARSIZE(ptr, len) (((varlena*)(ptr))->vl_len_ = (len))
// Variable header size
#define VARHDRSZ sizeof(int32_t)Support for PostgreSQL's array type.
Files: src/pg_ext/array.cc, include/pg_ext/array.hh
struct ArrayType {
int32_t vl_len_; // varlena header
int ndim; // Number of dimensions
int32_t dataoffset; // Offset to data
Oid elemtype; // Element type OID
int* dim; // Dimension sizes
int* lbound; // Lower bounds
char* data; // Element data
};// Get array element
Datum array_get_element(ArrayType* array, int n);
// Get number of elements
int array_length(ArrayType* array);
// Iterate array
void array_foreach(ArrayType* array, void (*callback)(Datum, void*), void* context);PostgreSQL's arbitrary-precision numeric type.
Files: src/pg_ext/numeric.cc, include/pg_ext/numeric.hh
struct NumericData {
int ndigits; // Number of digits
int weight; // Weight of first digit
int sign; // Sign (positive/negative/NaN)
int dscale; // Display scale
int16_t* digits; // Digit array
};
typedef NumericData* Numeric;// Convert numeric to int64
int64_t numeric_to_int64(Numeric num);
// Convert numeric to double
double numeric_to_double(Numeric num);
// Convert int64 to numeric
Numeric int64_to_numeric(int64_t value);PostgreSQL date/time types and operations.
Files: src/pg_ext/date.cc, include/pg_ext/date.hh
typedef int32_t DateADT; // Days since 2000-01-01
typedef int64_t Timestamp; // Microseconds since 2000-01-01 00:00:00
typedef int64_t TimestampTz; // Timestamp with timezone
typedef int64_t TimeADT; // Microseconds since midnight
typedef Interval IntervalADT; // Time interval
struct Interval {
int64_t time; // Microseconds
int32_t day; // Days
int32_t month; // Months
};// Current timestamp
Timestamp GetCurrentTimestamp();
// Date arithmetic
Timestamp timestamp_add_interval(Timestamp ts, Interval* interval);
// Conversions
DateADT timestamp_to_date(Timestamp ts);
Timestamp date_to_timestamp(DateADT date);PostgreSQL error reporting system.
Files: src/pg_ext/error.cc, include/pg_ext/error.hh
#define DEBUG5 10
#define DEBUG4 11
#define DEBUG3 12
#define DEBUG2 13
#define DEBUG1 14
#define LOG 15
#define NOTICE 18
#define WARNING 19
#define ERROR 20
#define FATAL 21
#define PANIC 22#define ereport(level, rest) \
do { \
if (level >= ERROR) { \
springtail_error_handler rest; \
} else { \
springtail_log_handler rest; \
} \
} while(0)Usage:
if (invalid_input) {
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("Invalid input value: %d", value)));
}#define elog(level, ...) \
do { \
if (level >= ERROR) { \
throw std::runtime_error(fmt::format(__VA_ARGS__)); \
} else { \
LOG_INFO(__VA_ARGS__); \
} \
} while(0)Usage:
elog(ERROR, "Failed to process request: %s", error_msg);Dynamic string buffer for building messages.
Files: src/pg_ext/pqformat.cc, include/pg_ext/pqformat.hh
struct StringInfoData {
char* data; // Buffer
int len; // Current length
int maxlen; // Allocated size
int cursor; // Read cursor
};
typedef StringInfoData* StringInfo;// Initialize
void initStringInfo(StringInfoData* str);
void resetStringInfo(StringInfoData* str);
// Append operations
void appendStringInfo(StringInfoData* str, const char* fmt, ...);
void appendStringInfoChar(StringInfoData* str, char ch);
void appendBinaryStringInfo(StringInfoData* str, const char* data, int datalen);
void appendBinaryStringInfoNT(StringInfoData* str, const char* data, int datalen);
// Enlarge buffer
void enlargeStringInfo(StringInfoData* str, int needed);Example:
StringInfoData buf;
initStringInfo(&buf);
appendStringInfo(&buf, "Value: %d, Name: %s", 42, "test");
appendStringInfoChar(&buf, '\n');
// Use buf.data
pfree(buf.data);PostgreSQL's node type system for parse trees and plans.
Files: src/pg_ext/node.cc, include/pg_ext/node.hh
struct Node {
NodeTag type; // Node type identifier
};
enum NodeTag {
T_Invalid = 0,
T_List,
T_Integer,
T_String,
T_SelectStmt,
// ... many more
};Node* copyObject(const Node* node);
bool equal(const Node* a, const Node* b);
char* nodeToString(const Node* node);PostgreSQL's doubly-linked list implementation.
Files: src/pg_ext/list.cc, include/pg_ext/list.hh
struct ListCell {
union {
void* ptr_value;
int int_value;
Oid oid_value;
} data;
ListCell* next;
};
struct List {
NodeTag type; // T_List
int length;
ListCell* head;
ListCell* tail;
};// Create
List* list_make1(void* datum1);
List* list_make2(void* datum1, void* datum2);
List* lappend(List* list, void* datum);
List* lcons(void* datum, List* list);
// Access
void* linitial(const List* list);
void* lsecond(const List* list);
void* llast(const List* list);
// Iteration
#define foreach(cell, list) \
for ((cell) = list_head(list); (cell) != NULL; (cell) = lnext(list, cell))Example:
List* mylist = NIL;
mylist = lappend(mylist, some_pointer);
mylist = lappend(mylist, another_pointer);
ListCell* cell;
foreach(cell, mylist) {
void* item = lfirst(cell);
// Process item...
}Extension types are stored in Springtail's type system through create_usertype():
// In pg_copy_table.cc
PgMsgUserType msg;
msg.oid = enum_type_oid;
msg.name = extn_type_name;
msg.type = constant::USER_TYPE_EXTENSION;
server->create_usertype(db_id, {xid, 0}, msg);Extension operators are used in Springtail comparisons:
// In field.cc
ExtensionContext ctx;
ctx.type_oid = field->type_oid;
ctx.op_str = operator_symbol;
return PgExtnRegistry::comparator_func(&ctx, lhs_binary, rhs_binary);PostgreSQL features NOT implemented in pg_ext:
- Transaction Management: No MVCC, snapshot isolation
- Catalog Access: No pg_catalog queries
- SPI (Server Programming Interface): No SQL execution from C functions
- Triggers: No trigger mechanism
- Full Memory Context Tree: Simplified single-level context
- Signal Handling: No PostgreSQL signal handlers
- Shared Memory: No PostgreSQL shared memory segments
- TOAST: No out-of-line storage for large values
- Vacuum: No MVCC cleanup
- Write-Ahead Log: No WAL integration
- Extension functions must not rely on PostgreSQL backend state
- SPI-using extensions will NOT work
- Trigger-based extensions will NOT work
- Extensions accessing pg_catalog may fail
- Extension functions assuming PostgreSQL memory semantics may leak memory
To add support for a new PostgreSQL feature:
- Identify Required APIs: Determine what PostgreSQL functions the extension uses
-
Create Stub Implementation: Add stubs in appropriate
src/pg_ext/*.ccfile -
Add Type Definitions: Add necessary structs/types in
include/pg_ext/*.hh - Implement Core Logic: Implement minimal functionality needed by extensions
- Test with Extension: Verify extension functions work correctly
Example: Adding Support for Range Types
- Create header file
include/pg_ext/range.hh:
// include/pg_ext/range.hh
struct RangeType {
Oid rangetypid;
char flags;
Datum lower;
Datum upper;
};
Datum range_in(const char* str, Oid rangetypid);
char* range_out(RangeType* range);
bool range_contains(RangeType* range, Datum element);- Create source file
src/pg_ext/range.cc:
// src/pg_ext/range.cc
#include <pg_ext/range.hh>
Datum range_in(const char* str, Oid rangetypid) {
// Parse range string "[lower,upper)"
// Create RangeType structure
// Return as Datum
}
// ... implement other functions