Describe the bug, including details regarding any error messages, version, and platform.
I'm interacting with a Flight gRPC server written in Rust and attempting to call DoGet. The service is handing back a Ticket size of 32 bytes (ostensibly 2 UUIDs concatenated together - the server has good reason for that ticket size and it cannot be easily changed).
I can successfully read the Flight RecordBatch stream in Rust and in Golang, but cannot read back the RecordBatch stream in Python (and by extension probably C++).
Below is a working Golang example (presuming the ticket has been stored off in a ticket.bin file of 32 bytes).
(PLEASE NOTE - much of this code has been generated with LLM assistance, I have tried to check it for accuracy, but if there is a gross misuse of the Arrow Flight library somewhere in here that's an error-between-user-and-keyboard, my up-front apologies):
package main
import (
"context"
"fmt"
"os"
"github.com/apache/arrow/go/arrow/flight"
"google.golang.org/grpc"
"google.golang.org/grpc/credentials/insecure"
)
func main() {
// 1. Define the Flight server address
serverAddr := "localhost:9090" // Change if your Flight server is elsewhere
// 2. Read the ticket from the file created by the Python script
ticketBytes, err := os.ReadFile("ticket.bin")
if err != nil {
fmt.Printf("Failed to read ticket file 'ticket.bin': %v\n", err)
os.Exit(1)
}
fmt.Printf("Successfully read %d-byte ticket from file.\n", len(ticketBytes))
// 3. Connect to the Arrow Flight server
// We use insecure credentials for this example.
client, err := flight.NewClientWithMiddleware(serverAddr, nil, nil, grpc.WithTransportCredentials(insecure.NewCredentials()))
if err != nil {
fmt.Printf("Failed to create Flight client: %v\n", err)
os.Exit(1)
}
defer client.Close()
fmt.Printf("Connected to Flight server at %s\n", serverAddr)
// 4. Create the Flight Ticket object
ticket := &flight.Ticket{Ticket: ticketBytes}
// 5. Perform the DoGet call
ctx := context.Background()
stream, err := client.DoGet(ctx, ticket)
if err != nil {
fmt.Printf("DoGet call failed: %v\n", err)
os.Exit(1)
}
// 6. Read the entire RecordBatch stream
// The reader will handle consuming all messages and assembling the table.
reader, err := flight.NewRecordReader(stream)
if err != nil {
fmt.Printf("Failed to create record reader from stream: %v\n", err)
os.Exit(1)
}
defer reader.Release()
rowCount := int64(0)
// Iterate through the stream and count the rows
for reader.Next() {
rec := reader.Record()
rowCount += rec.NumRows()
}
if reader.Err() != nil {
fmt.Printf("Error reading stream: %v\n", reader.Err())
os.Exit(1)
}
fmt.Println("\n--- Go Client Success! ---")
fmt.Printf("Successfully read stream and found %d total rows.\n", rowCount)
// You could print the schema with `fmt.Println(reader.Schema())`
}
I can provide a Rust client sample too if helpful, but it's a little more work to cobble that together.
Here's the problematic Python code:
flight_service_stub = Flight_pb2_grpc.FlightServiceStub(channel)
# then later on after obtaining the ticket (or reading it from `ticket.bin`):
## 1. Make the raw gRPC call (same as before)
## This returns a raw, iterable gRPC stream object.
raw_stream = flight_service_stub.DoGet(flight_ticket_pb)
## 2. Define a simple generator function that yields each message from the raw stream.
## This acts as a bridge between the gRPC stream and the PyArrow reader.
def stream_generator(stream):
for item in stream:
yield item
# --- Explicitly define the schema ---
# NOTE: Field names have been omitted for data sensitivity reasons.
explicit_schema = pyarrow.schema([
pyarrow.field("<omitted>", pyarrow.uint64(), nullable=False),
pyarrow.field("<omitted>", pyarrow.uint64(), nullable=False),
pyarrow.field("<omitted>", pyarrow.uint64(), nullable=False),
pyarrow.field("<omitted>", pyarrow.uint64(), nullable=False),
pyarrow.field("<omitted>", pyarrow.float64(), nullable=False)
])
## 3. Wrap the generator in a GeneratorStream.
## This is a specific PyArrow class designed for this purpose.
generator_stream = pyarrow.flight.GeneratorStream(
# I've tried with and without defining a schema here, either way results in the same SEGFAULT
#schema=None,
schema=explicit_schema,
generator=stream_generator(raw_stream)
)
## 4. Use a FlightStreamReader on the *wrapped* stream.
reader = pyarrow.flight.FlightStreamReader(generator_stream)
## 5. Read the data
## ***************************************************************************************************
# SEGFAULT OCCURS HERE
## ***************************************************************************************************
data_table = reader.read_all()
If I attach gdb, here's the backtrace I get:
0x00007fffdd12ac87 in arrow::flight::FlightStreamReader::ToTable(arrow::StopToken const&) () from python-client/.venv/lib64/python3.12/site-packages/pyarrow/libarrow_flight.so.2300
Missing separate debuginfos, use: yum debuginfo-install glibc-2.28-251.el8_10.31.x86_64 libgcc-8.5.0-28.el8_10.x86_64 libstdc++-8.5.0-28.el8_10.x86_64 libuuid-2.32.1-48.el8_10.x86_64 mpdecimal-2.5.1-3.el8.x86_64 openssl-libs-1.1.1k-15.el8_6.x86_64 python3.12-libs-3.12.12-4.el8_10.x86_64 zlib-1.2.11-25.el8.x86_64
(gdb) bt
#0 0x00007fffdd12ac87 in arrow::flight::FlightStreamReader::ToTable(arrow::StopToken const&) () from python-client/.venv/lib64/python3.12/site-packages/pyarrow/libarrow_flight.so.2300
#1 0x00007fffde27b13f in __pyx_pw_7pyarrow_7_flight_18FlightStreamReader_3read_all(_object*, _object* const*, long, _object*) [clone .lto_priv.0] () from python-client/.venv/lib64/python3.12/site-packages/pyarrow/_flight.cpython-312-x86_64-linux-gnu.so
#2 0x00007ffff77669bf in PyObject_Vectorcall () from /lib64/libpython3.12.so.1.0
#3 0x00007ffff76fb0ba in _PyEval_EvalFrameDefault () from /lib64/libpython3.12.so.1.0
#4 0x00007ffff7771e7c in PyEval_EvalCode () from /lib64/libpython3.12.so.1.0
#5 0x00007ffff7845561 in run_eval_code_obj () from /lib64/libpython3.12.so.1.0
#6 0x00007ffff7883b5b in run_mod () from /lib64/libpython3.12.so.1.0
#7 0x00007ffff788f030 in pyrun_file () from /lib64/libpython3.12.so.1.0
#8 0x00007ffff788f212 in _PyRun_SimpleFileObject () from /lib64/libpython3.12.so.1.0
#9 0x00007ffff788f2e4 in _PyRun_AnyFileObject () from /lib64/libpython3.12.so.1.0
#10 0x00007ffff788f6b3 in Py_RunMain () from /lib64/libpython3.12.so.1.0
#11 0x00007ffff7890dc9 in Py_BytesMain () from /lib64/libpython3.12.so.1.0
#12 0x00007ffff679b865 in __libc_start_main () from /lib64/libc.so.6
(again, some sensitive file path prefixes removed from stack trace).
My best guess and working theory is that there's an issue with the 32 byte ticket size the Rust service is returning, but that's based on interactive LLM hypothesizing and not direct evidence in the C++ implementation.
Rust Flight version (server-side): 58.1.0
Golang flight version: github.com/apache/arrow/go/v16/arrow/flight
Python flight version: 23.0.1
Python version: 3.12
golang version: 1.22.3
Rust version: 1.93.1
Component(s)
C++, FlightRPC, Python
Describe the bug, including details regarding any error messages, version, and platform.
I'm interacting with a Flight gRPC server written in Rust and attempting to call
DoGet. The service is handing back a Ticket size of 32 bytes (ostensibly 2 UUIDs concatenated together - the server has good reason for that ticket size and it cannot be easily changed).I can successfully read the Flight RecordBatch stream in Rust and in Golang, but cannot read back the RecordBatch stream in Python (and by extension probably C++).
Below is a working Golang example (presuming the ticket has been stored off in a
ticket.binfile of 32 bytes).(PLEASE NOTE - much of this code has been generated with LLM assistance, I have tried to check it for accuracy, but if there is a gross misuse of the Arrow Flight library somewhere in here that's an error-between-user-and-keyboard, my up-front apologies):
I can provide a Rust client sample too if helpful, but it's a little more work to cobble that together.
Here's the problematic Python code:
If I attach gdb, here's the backtrace I get:
(again, some sensitive file path prefixes removed from stack trace).
My best guess and working theory is that there's an issue with the 32 byte ticket size the Rust service is returning, but that's based on interactive LLM hypothesizing and not direct evidence in the C++ implementation.
Rust Flight version (server-side): 58.1.0
Golang flight version: github.com/apache/arrow/go/v16/arrow/flight
Python flight version: 23.0.1
Python version: 3.12
golang version: 1.22.3
Rust version: 1.93.1
Component(s)
C++, FlightRPC, Python