Skip to content

WIP: Dummy PR to check maint-23.0.1 status#49130

Open
raulcd wants to merge 28 commits intomainfrom
maint-23.0.x
Open

WIP: Dummy PR to check maint-23.0.1 status#49130
raulcd wants to merge 28 commits intomainfrom
maint-23.0.x

Conversation

@raulcd
Copy link
Member

@raulcd raulcd commented Feb 3, 2026

Caution

Do not merge this PR.

This PR is being used to test the status of the 23.0.1 release branch on CI and should not be merged.

raulcd and others added 23 commits January 12, 2026 12:07
…rfile (#48828)

### Rationale for this change

The emscripten job has been failing on the nightlies jobs

### What changes are included in this PR?

Install dependencies slightly earlier on the Dockerfile and add xz which is required on `install_emscripten.sh` now.

### Are these changes tested?

Yes via archery.

### Are there any user-facing changes?

No

* GitHub Issue: #48827

Authored-by: Raúl Cumplido <raulcumplido@gmail.com>
Signed-off-by: Raúl Cumplido <raulcumplido@gmail.com>
…-hosted runners (#48583)

### Rationale for this change

The CUDA jobs stopped working when Voltron Data infrastructure went down. We have set up with ASF Infra a [runs-on](https://runs-on.com/runners/gpu/) solution to run CUDA runners.

### What changes are included in this PR?

Add the new workflow for `cuda_extra.yml` with CI jobs that use the runs-on CUDA runners.

Due to the underlying instances having CUDA 12.9 the jobs to be run are:
- AMD64 Ubuntu 22 CUDA 11.7.1
- AMD64 Ubuntu 24 CUDA 12.9.0
- AMD64 Ubuntu 22 CUDA 11.7.1 Python
- AMD64 Ubuntu 24 CUDA 12.9.0 Python

A follow up issue has been created to add jobs for CUDA 13, see: #48783

A new label `CI: Extra: CUDA` has also been created.

### Are these changes tested?

Yes via CI

### Are there any user-facing changes?

No

* GitHub Issue: #48582

Authored-by: Raúl Cumplido <raulcumplido@gmail.com>
Signed-off-by: Raúl Cumplido <raulcumplido@gmail.com>
… and publish draft release before verification (#48839)

### Rationale for this change

With the change we did for immutable releases we required draft releases to be able to keep uploading artifacts during the release process. This means that the interim URL to download assets isn't the expected one on some of our scripts.

### What changes are included in this PR?

Update the `download_rc_archive` task so we use the GitHub cli tool instead of manually building the download URL for the source tar.gz from the release.
Update order of release scripts to publish the release before running verification tasks so the URL is the final one.

### Are these changes tested?

I have manually tested both the `gh release download` script and that the final URL will be the expected one once we move from draft to published release. I've tested creating a new release on my own fork here:
https://github.com/raulcd/arrow/releases/tag/test-release-rc2

### Are there any user-facing changes?

No

* GitHub Issue: #48838

Authored-by: Raúl Cumplido <raulcumplido@gmail.com>
Signed-off-by: Raúl Cumplido <raulcumplido@gmail.com>
… Packaging jobs (#48842)

### Rationale for this change

With:
- #48839

We use `gh release download`. This requires the GH_TOKEN available.

### What changes are included in this PR?

Add env with `GH_TOKEN`. I've validate the Rake's `sh` should inherit the environment variables that are defined on your shell.

### Are these changes tested?

No

### Are there any user-facing changes?
No

* GitHub Issue: #48841

Authored-by: Raúl Cumplido <raulcumplido@gmail.com>
Signed-off-by: Raúl Cumplido <raulcumplido@gmail.com>
…#48845)

### Rationale for this change

The IPC file exposes [redundant information](https://github.com/apache/arrow/blob/d54a2051cf9020a0fdf50836420c38ad14787abb/format/File.fbs#L39-L50) about Message sizes so as to allow for random access from the file footer.

We tried adding [consistency checks](#19596) in the past but this hit a bug in the JavaScript IPC writer at the time, so the checks were left disabled.

The JavaScript implementation was fixed soon after (7 years ago), so this PR re-enables those checks so as to more easily detect potentially invalid IPC files.

### Are these changes tested?

By existing tests.

### Are there any user-facing changes?

No, unless they try reading invalid IPC files.

* GitHub Issue: #48844

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
### Rationale for this change

Date is currently wrong

### What changes are included in this PR?

Update Copyright Notice to cover year 2026

### Are these changes tested?

No

### Are there any user-facing changes?

No breaking change but Yes in terms of copyright date updated.

* GitHub Issue: #48856

Authored-by: Raúl Cumplido <raulcumplido@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
### Rationale for this change

Fixing: #48311

### What changes are included in this PR?

Applied fix from #48311 and added test

### Are these changes tested?

Yes, added test, without my patch test fails with debug check:
```cpp
Note: Google Test filter = TestBufferedInputStream.PeekAfterExhaustingBuffer
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from TestBufferedInputStream
[ RUN      ] TestBufferedInputStream.PeekAfterExhaustingBuffer
/Users/chegoryu/Junk/git/arrow/cpp/src/arrow/io/buffered.cc:337:  Check failed: buffer_->size() - buffer_pos_ >= nbytes
```

### Are there any user-facing changes?

No, this PR fixes a bug

* GitHub Issue: #48311

Lead-authored-by: Egor Chunaev <ch.egor.yu@gmail.com>
Co-authored-by: mwish <maplewish117@gmail.com>
Co-authored-by: chegoryu <ch.egor.yu@gmail.com>
Signed-off-by: mwish <maplewish117@gmail.com>
…8624)

### Rationale for this change

Our email reports miss the following headers:

* `MIME-Version: 1.0`
* `Content-Type: text/plain; charset="utf-8"`
* `Message-Id: ${AUTO_GENERATED_MESSAGE_ID}`
* `Date: ${DATE_IN_RFC_2822}`

### What changes are included in this PR?

Add these headers.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No.
* GitHub Issue: #48623

Authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
### Rationale for this change

https://docs.python.org/3/library/smtplib.html#smtplib.SMTP.sendmail uses the third argument as e-mail content but https://docs.python.org/3/library/smtplib.html#smtplib.SMTP.send_message uses the first argument as e-mail.

### What changes are included in this PR?

* Pass e-mail as the first argument
* Remove redundant from and to addresses
  * They are extracted from the given e-mail automatically
  
### Are these changes tested?

Yes. I sent a test e-mail manually.

### Are there any user-facing changes?

No.
* GitHub Issue: #48861

Authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Raúl Cumplido <raulcumplido@gmail.com>
…instead of final Azure::Storage::StorageException and set minimum nodejs on conda env to 16 for Azurite to work (#48895)

### Rationale for this change

nodejs 12 is currently being installed on conda. CI jobs are failing and or segfaulting due to azurite failing with old versions.

```
2026-01-13T18:32:39.6961900Z #15 [ 9/11] RUN /arrow/ci/scripts/install_azurite.sh
2026-01-13T18:32:39.9624124Z #15 0.417 Node.js version = v12.4.0
2026-01-13T18:32:42.2087322Z #15 2.663 npm WARN deprecated rimraf@ 3.0.2: Rimraf versions prior to v4 are no longer supported
2026-01-13T18:32:42.3917601Z #15 2.846 npm WARN deprecated uuid@ 3.4.0: Please upgrade  to version 7 or higher.  Older versions may use Math.random() in certain circumstances, which is known to be problematic.  See https://v8.dev/blog/math-random for details.
2026-01-13T18:32:51.4870197Z #15 11.94 npm WARN deprecated glob@ 7.2.3: Glob versions prior to v9 are no longer supported
2026-01-13T18:32:51.7035681Z #15 12.01 npm WARN deprecated inflight@ 1.0.6: This module is not supported, and leaks memory. Do not use it. Check out lru-cache if you want a good and tested way to coalesce async requests by a key value, which is much more comprehensive and powerful.
2026-01-13T18:33:02.1406491Z #15 22.59 /opt/conda/envs/arrow/bin/azurite -> /opt/conda/envs/arrow/lib/node_modules/azurite/dist/src/azurite.js
2026-01-13T18:33:02.3841290Z #15 22.60 /opt/conda/envs/arrow/bin/azurite-queue -> /opt/conda/envs/arrow/lib/node_modules/azurite/dist/src/queue/main.js
2026-01-13T18:33:02.3842792Z #15 22.60 /opt/conda/envs/arrow/bin/azurite-blob -> /opt/conda/envs/arrow/lib/node_modules/azurite/dist/src/blob/main.js
2026-01-13T18:33:02.3844216Z #15 22.60 /opt/conda/envs/arrow/bin/azurite-table -> /opt/conda/envs/arrow/lib/node_modules/azurite/dist/src/table/main.js
2026-01-13T18:33:02.3846002Z #15 22.66 npm WARN applicationinsights@ 2.9.8 requires a peer of applicationinsights-native-metrics@* but none is installed. You must install peer dependencies yourself.
2026-01-13T18:33:02.3847278Z #15 22.66 
2026-01-13T18:33:02.3847564Z #15 22.66 + azurite@ 3.35.0
2026-01-13T18:33:02.3848038Z #15 22.66 added 376 packages from 296 contributors in 20.644s
2026-01-13T18:33:02.3848830Z #15 22.69 /opt/conda/envs/arrow/bin/azurite
2026-01-13T18:33:02.8929329Z #15 23.35 /opt/conda/envs/arrow/lib/node_modules/azurite/node_modules/fs-extra/lib/util/async.js:14
2026-01-13T18:33:02.8930231Z #15 23.35         (err) => err ?? new Error('unknown error')
2026-01-13T18:33:02.8930740Z #15 23.35                       ^
```

The job on PyArrow was segfaulting due to an Exception being thrown but not catch. In general we were using `Azure::Storage::StorageException` but `Azure::Core::Http::TransportException` could also be thrown on some cases.
Bot are final but inherit from `Azure::Core::RequestFailedException`.

### What changes are included in this PR?

- Pin minimum nodejs version to 16 so the failure doesn't happen again.
- Update catching `Azure::Storage::StorageException` to `Azure::Core::RequestFailedException` so `Azure::Core::Http::TransportException` is also catch.

### Are these changes tested?

Yes on CI.

### Are there any user-facing changes?

No

* GitHub Issue: #48894

Authored-by: Raúl Cumplido <raulcumplido@gmail.com>
Signed-off-by: Raúl Cumplido <raulcumplido@gmail.com>
…rification (#48859)

### Rationale for this change

When reading an encrypted Parquet file with a plaintext footer, the Parquet reader is able to verify footer integrity by comparing the signature in the file with the one computed by encrypting the footer.

However, the way it does this is to first re-serializes the deserialized footer using Thrift. This has several issues:

1. it's inefficient
2. it's not obvious that it will always produce the same Thrift encoding as the original, leading to spurious signature verification failures
3. if the original footer deserializes to invalid enum values, attempting to serialize it again will lead to undefined behavior

Reason 3 is what allowed this to be uncovered by OSS-Fuzz (see https://oss-fuzz.com/testcase-detail/4740205688193024).

This PR switches to reusing the original serialized metadata.

### Are these changes tested?

Yes, by existing tests and new fuzz regression file.

### Are there any user-facing changes?

No.

* GitHub Issue: #48858

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
…unt in IPC (#48901)

### Rationale for this change

An incorrect variadic buffer count could easily blow up memory when reserving a vector of Buffers, even though the RecordBatch has a lot less buffers available.

Reported by OSS-Fuzz at https://issues.oss-fuzz.com/issues/476180608, and separately by Silas Boch.

### What changes are included in this PR?

Pre-validate the variadic buffer count read from the IPC RecordBatch table. Initial patch by Silas Boch.

### Are these changes tested?

Yes, by additional fuzz regression file.

### Are there any user-facing changes?

No.

**This PR contains a "Critical Fix".** (If the changes fix either (a) a security vulnerability, (b) a bug that caused incorrect or invalid data to be produced, or (c) a bug that causes a crash (even when the API contract is upheld), please provide explanation. If not, you can remove this.)

* GitHub Issue: #48900

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
…CMake target (#48891)

### Rationale for this change

We changed `macro(build_google_cloud_cpp_storage)` to `function(...)` in GH-48333 . So `find_curl()` doesn't change `ARROW_SYSTEM_DEPENDENCIES` in parent scope. (`function()` creates a new scope.)

### What changes are included in this PR?

Move `find_curl()` to the top-level from in `function(build_google_cloud_cpp_storage)`.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

Yes.
* GitHub Issue: #48885

Authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
### Rationale for this change

The CPU attributes are not passed to the LLVM layer, which means potential optimizations could be missed leading to inefficient code. This feature was lost as part of the refactoring in 83cba25

I also discovered a bug with decimal alignments that was exposed by this change and was only reproducible in our test environment.

### What changes are included in this PR?

Pass the CPU attributes to the LLVM code generation, and a unit test.
Fix the 16 bit vs 8 bit decimal alignment problem. This was causing a crash sometimes on certain architectures with certain queries. Added a unit test.

### Are these changes tested?
Yes.

### Are there any user-facing changes?
No.

* GitHub Issue: #48160

Lead-authored-by: Logan Riggs <logan.riggs@dremio.com>
Co-authored-by: lriggs <logan.riggs@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
### Rationale for this change
Fix a building RE2 with C++20

### What changes are included in this PR?
The fix, a test

### Are these changes tested?
Yes

### Are there any user-facing changes?
No

* GitHub Issue: #48973

Authored-by: Jonathan Keane <jkeane@gmail.com>
Signed-off-by: Jonathan Keane <jkeane@gmail.com>
…#48919)

### Rationale for this change

We must mark all nodes in `Arrow::ExecutePlan` but only the first node is marked. 

### What changes are included in this PR?

Fix typos in variable name.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

Yes.
* GitHub Issue: #48880

Authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
…48747)

### Rationale for this change
#48637

Arrow Flight SQL ODBC gets `potential deadlock` error during tests because this error is happening at Arrow Flight SQL (see #48714). Arrow doesn't use `absl::Mutex` directly, and `absl::Mutex` is used by upstream projects gRPC/Protobuf, so Arrow itself likely did not cause the potential deadlock. We can disable the deadlock detection for now.

### What changes are included in this PR?
- Disable `absl` deadlock detection inside ODBC, so potential deadlock detection from upstream projects don't get picked up in the tests.

### Are these changes tested?
- Tested locally on MSVC Windows
### Are there any user-facing changes?
N/A
* GitHub Issue: #48637

Authored-by: Alina (Xi) Li <alina.li@improving.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
…htInfo to nullptr instead of NULL (#48968)

### Rationale for this change

Cython built code is currently failing to compile on free threaded wheels due to:
```
/arrow/python/build/temp.linux-x86_64-cpython-313t/_flight.cpp: In function ‘PyObject* __pyx_gb_7pyarrow_7_flight_12FlightClient_9do_action_2generator2(__pyx_CoroutineObject*, PyThreadState*, PyObject*)’:
/arrow/python/build/temp.linux-x86_64-cpython-313t/_flight.cpp:43068:110: error: call of overloaded ‘unique_ptr(NULL)’ is ambiguous
43068 |           __pyx_t_3 = (__pyx_cur_scope->__pyx_v_result->result == ((std::unique_ptr< arrow::flight::Result> )NULL));
      |                            
```

### What changes are included in this PR?

Update comparing `unique_ptr[CFlightResult]` and `unique_ptr[CFlightInfo]` from `NULL` to `nullptr`.

### Are these changes tested?

Yes via archery.

### Are there any user-facing changes?

No

* GitHub Issue: #48965

Authored-by: Raúl Cumplido <raulcumplido@gmail.com>
Signed-off-by: Raúl Cumplido <raulcumplido@gmail.com>
### What changes are included in this PR?

Bug fixes and robustness improvements in the IPC file reader:
* Fix bug reading variadic buffers with pre-buffering enabled
* Fix bug reading dictionaries with pre-buffering enabled
* Validate IPC buffer offsets and lengths

Testing improvements:
* Exercise pre-buffering in IPC tests
* Actually exercise variadic buffers in IPC tests, by ensuring non-inline binary views are generated
* Run fuzz targets on golden IPC integration files in ASAN/UBSAN CI job
* Exercise pre-buffering in the IPC file fuzz target

Miscellaneous:
* Add convenience functions for integer overflow checking

### Are these changes tested?

Yes, by existing and improved tests.

### Are there any user-facing changes?

Bug fixes.

**This PR contains a "Critical Fix".** Fixes a potential crash reading variadic buffers with pre-buffering enabled.

* GitHub Issue: #48924

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
…and the Flight Client (#48967)

### Rationale for this change

The bug breaks a Flight SQL server that refreshens the auth token when cookie authentication is enabled

### What changes are included in this PR?

1. In the ODBC layer, removed the code that adds a 2nd ClientCookieMiddlewareFactory in the client options (the 1st one is registered in `BuildFlightClientOptions`). This fixes the issue of the duplicate header cookie fields.
2. In the flight client layer, uses the case-insensitive equality comparator instead of the case-insensitive less-than comparator for the cookies cache which is an unordered map. This fixes the issue of duplicate cookie keys.

### Are these changes tested?
Manually on Windows, and CI

### Are there any user-facing changes?

No
* GitHub Issue: #48966

Authored-by: jianfengmao <jianfengmao@deephaven.io>
Signed-off-by: David Li <li.davidm96@gmail.com>
@raulcd

This comment was marked as outdated.

@raulcd raulcd added the CI: Extra: Package: Linux Run extra Linux Packages CI label Feb 3, 2026
@raulcd
Copy link
Member Author

raulcd commented Feb 3, 2026

@github-actions crossbow submit -g verify-rc-source

@github-actions

This comment was marked as outdated.

@github-actions
Copy link

github-actions bot commented Feb 3, 2026

Revision: 6b12dac

Submitted crossbow builds: ursacomputing/crossbow @ actions-ce3dc66a0a

Task Status
verify-rc-source-cpp-linux-almalinux-10-amd64 GitHub Actions
verify-rc-source-cpp-linux-conda-latest-amd64 GitHub Actions
verify-rc-source-cpp-linux-ubuntu-22.04-amd64 GitHub Actions
verify-rc-source-cpp-linux-ubuntu-24.04-amd64 GitHub Actions
verify-rc-source-cpp-macos-amd64 GitHub Actions
verify-rc-source-cpp-macos-arm64 GitHub Actions
verify-rc-source-cpp-macos-conda-amd64 GitHub Actions
verify-rc-source-integration-linux-almalinux-10-amd64 GitHub Actions
verify-rc-source-integration-linux-conda-latest-amd64 GitHub Actions
verify-rc-source-integration-linux-ubuntu-22.04-amd64 GitHub Actions
verify-rc-source-integration-linux-ubuntu-24.04-amd64 GitHub Actions
verify-rc-source-integration-macos-amd64 GitHub Actions
verify-rc-source-integration-macos-arm64 GitHub Actions
verify-rc-source-integration-macos-conda-amd64 GitHub Actions
verify-rc-source-python-linux-almalinux-10-amd64 GitHub Actions
verify-rc-source-python-linux-conda-latest-amd64 GitHub Actions
verify-rc-source-python-linux-ubuntu-22.04-amd64 GitHub Actions
verify-rc-source-python-linux-ubuntu-24.04-amd64 GitHub Actions
verify-rc-source-python-macos-amd64 GitHub Actions
verify-rc-source-python-macos-arm64 GitHub Actions
verify-rc-source-python-macos-conda-amd64 GitHub Actions
verify-rc-source-ruby-linux-almalinux-10-amd64 GitHub Actions
verify-rc-source-ruby-linux-conda-latest-amd64 GitHub Actions
verify-rc-source-ruby-linux-ubuntu-22.04-amd64 GitHub Actions
verify-rc-source-ruby-linux-ubuntu-24.04-amd64 GitHub Actions
verify-rc-source-ruby-macos-amd64 GitHub Actions
verify-rc-source-ruby-macos-arm64 GitHub Actions
verify-rc-source-windows GitHub Actions

@raulcd
Copy link
Member Author

raulcd commented Feb 3, 2026

Discussing Ruby failures here:
#48985 (comment)

@raulcd
Copy link
Member Author

raulcd commented Feb 3, 2026

@github-actions crossbow submit verify-rc-source-windows

@github-actions
Copy link

github-actions bot commented Feb 3, 2026

Revision: f7d709e

Submitted crossbow builds: ursacomputing/crossbow @ actions-75f4b1b443

Task Status
verify-rc-source-windows GitHub Actions

raulcd and others added 5 commits February 4, 2026 09:44
…add check to validate LICENSE.txt and NOTICE.txt are part of the wheel contents (#48988)

Currently the files are missing from the published wheels.

- Ensure the license and notice files are part of the wheels
- Use build frontend to build wheels
- Build wheel from sdist

Yes, via archery.
I've validated all wheels will fail with the new check if LICENSE.txt or NOTICE.txt are missing:
```
 AssertionError: LICENSE.txt is missing from the wheel.
```

No

* GitHub Issue: #48983

Lead-authored-by: Raúl Cumplido <raulcumplido@gmail.com>
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
Co-authored-by: Rok Mihevc <rok@mihevc.org>
Signed-off-by: Raúl Cumplido <raulcumplido@gmail.com>
### Rationale for this change

Fix two issues found by OSS-Fuzz in the IPC reader:

* a controlled abort on invalid IPC metadata: https://oss-fuzz.com/testcase-detail/5301064831401984
* a nullptr dereference on invalid IPC metadata: https://oss-fuzz.com/testcase-detail/5091511766417408

None of these two issues is a security issue.

### Are these changes tested?

Yes, by new unit tests and new fuzz regression files.

### Are there any user-facing changes?

No.

**This PR contains a "Critical Fix".** (If the changes fix either (a) a security vulnerability, (b) a bug that caused incorrect or invalid data to be produced, or (c) a bug that causes a crash (even when the API contract is upheld), please provide explanation. If not, you can remove this.)

* GitHub Issue: #49059

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
)

### Rationale for this change

I noticed a reference to a `release_candidate.sh` in the `paths` field in `release_candidate.yml` which is a file that doesn't exist. I think this was just a typo made during refactoring.

### What changes are included in this PR?

Corrected `paths` list entry.

### Are these changes tested?

No.

### Are there any user-facing changes?

No.

Authored-by: Bryce Mecum <petridish@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
…required user-agent on urllib request (#49052)

### Rationale for this change

See: #49044

### What changes are included in this PR?

Urllib now request with `"user-agent": "pyarrow"`

### Are these changes tested?

It's a CI fix.

### Are there any user-facing changes?

No, just a CI test fix.
* GitHub Issue: #49044

Authored-by: Rok Mihevc <rok@mihevc.org>
Signed-off-by: Raúl Cumplido <raulcumplido@gmail.com>
…ng (#49095)

### Rationale for this change
This PR restores the behavior previous to version 23 for floating-point parsing on overflow and subnormal.

`fast_float` didn't assign an error code on overflow in version `3.10.1` and assigned `±Inf` on overflow and `0.0` on subnormal. With the update to version `8.1`, it started to assign `std::errc::result_out_of_range` in such cases. 

### What changes are included in this PR?
Ignores `std::errc::result_out_of_range` and produce `±Inf` / `0.0` as appropriate instead of failing the conversion.

### Are these changes tested?
Yes. Created tests for overflow with positive and negative signed mantissa, and also created tests for subnormal, all of them for binary{16,32,64}.

### Are there any user-facing changes?
It's a user facing change. The CSV reader on version `libarrow==23` was assigning them as strings, while before it was parsing it as `0` or `+- inf`.

With this patch, the CSV reader in PyArrow outputs:

```python
>>> import pyarrow
>>> import pyarrow.csv
>>> import io
>>> table = pyarrow.csv.read_csv(io.BytesIO(f"data\n10E-617\n10E617\n-10E617".encode()))
>>> print(table)
pyarrow.Table
data: double
----
data: [[0,inf,-inf]]
```

Closes #49003 

* GitHub Issue: #49003

Authored-by: Alvaro-Kothe <kothe65@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
@raulcd
Copy link
Member Author

raulcd commented Feb 4, 2026

@github-actions crossbow submit verify-rc-source-ruby-*

@github-actions
Copy link

github-actions bot commented Feb 4, 2026

Revision: f9376e4

Submitted crossbow builds: ursacomputing/crossbow @ actions-f487752277

Task Status
verify-rc-source-ruby-linux-almalinux-10-amd64 GitHub Actions
verify-rc-source-ruby-linux-conda-latest-amd64 GitHub Actions
verify-rc-source-ruby-linux-ubuntu-22.04-amd64 GitHub Actions
verify-rc-source-ruby-linux-ubuntu-24.04-amd64 GitHub Actions
verify-rc-source-ruby-macos-amd64 GitHub Actions
verify-rc-source-ruby-macos-arm64 GitHub Actions

@raulcd
Copy link
Member Author

raulcd commented Feb 4, 2026

@github-actions crossbow submit --group packaging

@github-actions
Copy link

github-actions bot commented Feb 4, 2026

Revision: f9376e4

Submitted crossbow builds: ursacomputing/crossbow @ actions-aaeef843f2

Task Status
conan-maximum GitHub Actions
conan-minimum GitHub Actions
matlab GitHub Actions
python-sdist GitHub Actions
r-binary-packages GitHub Actions
test-debian-12-docs GitHub Actions
wheel-macos-monterey-cp310-cp310-amd64 GitHub Actions
wheel-macos-monterey-cp310-cp310-arm64 GitHub Actions
wheel-macos-monterey-cp311-cp311-amd64 GitHub Actions
wheel-macos-monterey-cp311-cp311-arm64 GitHub Actions
wheel-macos-monterey-cp312-cp312-amd64 GitHub Actions
wheel-macos-monterey-cp312-cp312-arm64 GitHub Actions
wheel-macos-monterey-cp313-cp313-amd64 GitHub Actions
wheel-macos-monterey-cp313-cp313-arm64 GitHub Actions
wheel-macos-monterey-cp313-cp313t-amd64 GitHub Actions
wheel-macos-monterey-cp313-cp313t-arm64 GitHub Actions
wheel-macos-monterey-cp314-cp314-amd64 GitHub Actions
wheel-macos-monterey-cp314-cp314-arm64 GitHub Actions
wheel-macos-monterey-cp314-cp314t-amd64 GitHub Actions
wheel-macos-monterey-cp314-cp314t-arm64 GitHub Actions
wheel-manylinux-2-28-cp310-cp310-amd64 GitHub Actions
wheel-manylinux-2-28-cp310-cp310-arm64 GitHub Actions
wheel-manylinux-2-28-cp311-cp311-amd64 GitHub Actions
wheel-manylinux-2-28-cp311-cp311-arm64 GitHub Actions
wheel-manylinux-2-28-cp312-cp312-amd64 GitHub Actions
wheel-manylinux-2-28-cp312-cp312-arm64 GitHub Actions
wheel-manylinux-2-28-cp313-cp313-amd64 GitHub Actions
wheel-manylinux-2-28-cp313-cp313-arm64 GitHub Actions
wheel-manylinux-2-28-cp313-cp313t-amd64 GitHub Actions
wheel-manylinux-2-28-cp313-cp313t-arm64 GitHub Actions
wheel-manylinux-2-28-cp314-cp314-amd64 GitHub Actions
wheel-manylinux-2-28-cp314-cp314-arm64 GitHub Actions
wheel-manylinux-2-28-cp314-cp314t-amd64 GitHub Actions
wheel-manylinux-2-28-cp314-cp314t-arm64 GitHub Actions
wheel-musllinux-1-2-cp310-cp310-amd64 GitHub Actions
wheel-musllinux-1-2-cp310-cp310-arm64 GitHub Actions
wheel-musllinux-1-2-cp311-cp311-amd64 GitHub Actions
wheel-musllinux-1-2-cp311-cp311-arm64 GitHub Actions
wheel-musllinux-1-2-cp312-cp312-amd64 GitHub Actions
wheel-musllinux-1-2-cp312-cp312-arm64 GitHub Actions
wheel-musllinux-1-2-cp313-cp313-amd64 GitHub Actions
wheel-musllinux-1-2-cp313-cp313-arm64 GitHub Actions
wheel-musllinux-1-2-cp313-cp313t-amd64 GitHub Actions
wheel-musllinux-1-2-cp313-cp313t-arm64 GitHub Actions
wheel-musllinux-1-2-cp314-cp314-amd64 GitHub Actions
wheel-musllinux-1-2-cp314-cp314-arm64 GitHub Actions
wheel-musllinux-1-2-cp314-cp314t-amd64 GitHub Actions
wheel-musllinux-1-2-cp314-cp314t-arm64 GitHub Actions
wheel-windows-cp310-cp310-amd64 GitHub Actions
wheel-windows-cp311-cp311-amd64 GitHub Actions
wheel-windows-cp312-cp312-amd64 GitHub Actions
wheel-windows-cp313-cp313-amd64 GitHub Actions
wheel-windows-cp313-cp313t-amd64 GitHub Actions
wheel-windows-cp314-cp314-amd64 GitHub Actions
wheel-windows-cp314-cp314t-amd64 GitHub Actions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.