Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 10 additions & 21 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,33 +2,22 @@

## Developer Documentation

- [Quick Start Setup](docs/source/contributor-guide/development_environment.md#quick-start)
- [Testing Quick Start](docs/source/contributor-guide/testing.md#testing-quick-start)
- [Before Submitting a PR](docs/source/contributor-guide/index.md#before-submitting-a-pr)
- [Contributor Guide](docs/source/contributor-guide/index.md)
- [Architecture Guide](docs/source/contributor-guide/architecture.md)

## Before Committing

Before committing any changes, you **must** run the following checks and fix any issues:
Before committing any changes, you MUST follow the instructions in
[Before Submitting a PR](docs/source/contributor-guide/index.md#before-submitting-a-pr)
and ensure the required checks listed there pass. Do not commit code that
fails any of those checks.

```bash
cargo fmt --all
cargo clippy --all-targets --all-features -- -D warnings
```

- `cargo fmt` ensures consistent code formatting across the project.
- `cargo clippy` catches common mistakes and enforces idiomatic Rust patterns. All warnings must be resolved (treated as errors via `-D warnings`).

Do not commit code that fails either of these checks.
When creating a PR, you MUST follow the [PR template](.github/pull_request_template.md).

## Testing

Run relevant tests before submitting changes:

```bash
cargo test --all-features
```

For SQL logic tests:

```bash
cargo test -p datafusion-sqllogictest
```
See the [Testing Quick Start](docs/source/contributor-guide/testing.md#testing-quick-start)
for the recommended pre-PR test commands.
41 changes: 36 additions & 5 deletions docs/source/contributor-guide/development_environment.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,38 @@

This section describes how you can get started at developing DataFusion.

## Windows setup
## Quick Start

For the fastest path to a working local environment, follow these steps
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pulled most of this out of agents.md and left a link there instead

from the repository root:

```shell
# 1. Install Rust (https://rust-lang.org/tools/install/) and verify the active toolchain with
rustup show

# 2. Install protoc 3.15+ (see details below)
protoc --version

# 3. Download test data used by examples and many tests
git submodule update --init --recursive

# 4. Build the workspace
cargo build

# 5. Verify that Rust integration tests can be run
cargo test -p datafusion --test parquet_integration

# 6. Verify that sqllogictests can run
cargo test --profile=ci --test sqllogictests
```

Notes:

- The pinned Rust version is defined in `rust-toolchain.toml`.
- `protoc` is required to compile DataFusion from source.
- Some tests and examples rely on git submodule data being present locally.

## Windows Setup

```shell
wget https://az792536.vo.msecnd.net/vms/VMBuild_20190311/VirtualBox/MSEdge/MSEdge.Win10.VirtualBox.zip
Expand All @@ -34,19 +65,19 @@ cargo build

DataFusion has support for [dev containers](https://containers.dev/) which may be used for
developing DataFusion in an isolated environment either locally or remote if desired. Using dev containers for developing
DataFusion is not a requirement by any means but is available for those where doing local development could be tricky
DataFusion is not a requirement but is available where doing local development could be tricky
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drive by cleanup to make this more concise

such as with Windows and WSL2, those with older hardware, etc.

For specific details on IDE support for dev containers see the documentation for [Visual Studio Code](https://code.visualstudio.com/docs/devcontainers/containers),
[IntelliJ IDEA](https://www.jetbrains.com/help/idea/connect-to-devcontainer.html),
[Rust Rover](https://www.jetbrains.com/help/rust/connect-to-devcontainer.html), and
[GitHub Codespaces](https://docs.github.com/en/codespaces/setting-up-your-project-for-codespaces/adding-a-dev-container-configuration/introduction-to-dev-containers).

## Protoc Installation
## `protoc` Installation

Compiling DataFusion from sources requires an installed version of the protobuf compiler, `protoc`.

On most platforms this can be installed from your system's package manager
On most platforms this can be installed from your system's package manager. For example:

```
# Ubuntu
Expand All @@ -71,7 +102,7 @@ libprotoc 3.15.0

Alternatively a binary release can be downloaded from the [Release Page](https://github.com/protocolbuffers/protobuf/releases) or [built from source](https://github.com/protocolbuffers/protobuf/blob/main/src/README.md).

## Bootstrap environment
## Bootstrap Environment
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made heading consistent.


DataFusion is written in Rust and it uses a standard rust toolkit:

Expand Down
19 changes: 17 additions & 2 deletions docs/source/contributor-guide/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,10 @@ community as well as get more familiar with Rust and the relevant codebases.

## Development Environment

Setup your development environment [here](development_environment.md), and learn
how to test the code [here](testing.md).
Start with the [Development Environment Quick Start](development_environment.md#quick-start).

For more detail, see the full [development environment guide](development_environment.md)
and the [testing guide](testing.md).

## Finding and Creating Issues to Work On

Expand Down Expand Up @@ -99,6 +101,19 @@ If you are concerned that a larger design will be lost in a string of small PRs,

Note all commits in a PR are squashed when merged to the `main` branch so there is one commit per PR after merge.

## Before Submitting a PR

Before submitting a PR, run the standard formatting and lint checks and fix any
issues they report:

```bash
./ci/scripts/rust_fmt.sh
./ci/scripts/rust_clippy.sh
Comment on lines +110 to +111
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
./ci/scripts/rust_fmt.sh
./ci/scripts/rust_clippy.sh
./dev/rust_lint.sh
# use the `--write` flag to automatically fix some formatting and lint errors
# ./dev/rust_lint.sh --write --allow-dirty

This script is the entry point for all non-functional tests. It includes the previous two scripts as well as several others.

```

These scripts are the same checks run in CI for Rust formatting and clippy.
You should also run any relevant commands from the [testing quick start](testing.md#testing-quick-start).

## Conventional Commits & Labeling PRs

We generate change logs for each release using an automated process that will categorize PRs based on the title
Expand Down
32 changes: 32 additions & 0 deletions docs/source/contributor-guide/testing.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,38 @@ Tests are critical to ensure that DataFusion is working properly and
is not accidentally broken during refactorings. All new features
should have test coverage and the entire test suite is run as part of CI.

## Testing Quick Start
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THis is based on what started in AGENTS.md but I made it slightly easier to understand


While developing a feature or bug fix, best practice is to run the smallest set
of tests that gives confidence for your change, then expand as needed.

Initially, run the tests in the crates you changed. For example, if you made changes
to files in `datafusion-optimizer/src`, run the corresponding crate tests:

```shell
cargo test -p datafusion-optimizer
```

Then, run the `sqllogictest` suite, which is the main regression suite for SQL
behavior and covers most DataFusion features.
Comment on lines +38 to +39
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Then, run the `sqllogictest` suite, which is the main regression suite for SQL
behavior and covers most DataFusion features.
Then, run the `sqllogictest` suite, which provides a strong speed–coverage tradeoff for development: it runs quickly while offering broad regression coverage across most SQL behavior in DataFusion.


```shell
cargo test --profile=ci --test sqllogictests
```

Finally, before submitting a PR, run the tests for the core `datafusion` and
`datafusion-cli` crates:

```shell
cargo test -p datafusion
cargo test -p datafusion-cli
```

Some integration tests require optional external services such as Docker-backed
containers and may skip when unavailable.

## Testing Overview

DataFusion has several levels of tests in its [Test Pyramid] and tries to follow
the Rust standard [Testing Organization] described in [The Book].

Expand Down
Loading