-
Notifications
You must be signed in to change notification settings - Fork 450
Description
Apache Iceberg version
0.6.1 (latest release)
Please describe the bug 🐞
Hi,
I am trying to use the rest catalog and writing the data into Minio, the script I am using can communicate with Minio (it creates the metadata.json file under metadata directory, however, it raises OSError: When initiating multiple part upload for key 'poc_new/coordinates/data/00000-0-f27b7921-a6d7-4c7e-b034-2d12221e5054.parquet' in bucket 'warehouse': AWS Error NETWORK_CONNECTION during CreateMultipartUpload operation: Encountered network error when sending http request when it tries to write the data table.append(df)
this is the docker compose file that I use
version: '3'
services:
rest:
image: tabulario/iceberg-rest:1.5.0
container_name: iceberg-rest
ports:
- 8181:8181
environment:
- AWS_ACCESS_KEY_ID=admin
- AWS_SECRET_ACCESS_KEY=password
- AWS_REGION=us-east-1
- CATALOG_WAREHOUSE=s3://warehouse/
- CATALOG_IO__IMPL=org.apache.iceberg.aws.s3.S3FileIO
- CATALOG_S3_ENDPOINT=http://minio:9000
networks:
iceberg-rest:
minio:
image: minio/minio:RELEASE.2024-05-10T01-41-38Z
container_name: minio
environment:
- MINIO_ROOT_USER=admin
- MINIO_ROOT_PASSWORD=password
- MINIO_DOMAIN=minio
ports:
- 9001:9001
- 9000:9000
command: [ "server", "/data", "--console-address", ":9001" ]
networks:
iceberg-rest:
aliases:
- warehouse.minio
mc:
depends_on:
- minio
image: minio/mc:RELEASE.2024-05-09T17-04-24Z
container_name: mc
entrypoint: |
/bin/sh -c "
until (/usr/bin/mc config host add minio http://minio:9000 admin password)
do
echo '...waiting...' && sleep 1;
done;
/usr/bin/mc rm -r --force minio/warehouse;
/usr/bin/mc mb minio/warehouse;
/usr/bin/mc policy set public minio/warehouse;
tail -f /dev/null
"
environment:
- AWS_ACCESS_KEY_ID=admin
- AWS_SECRET_ACCESS_KEY=password
- AWS_REGION=us-east-1
networks:
iceberg-rest:
networks:
iceberg-rest:
And this the script file
import pyarrow as pa
from pyiceberg.catalog import load_rest
from pyiceberg.exceptions import NamespaceAlreadyExistsError, TableAlreadyExistsError
catalog = load_rest(
name="rest",
conf={
"uri": "http://localhost:8181/",
},
)
namespace = "poc_new"
try:
catalog.create_namespace(namespace)
except NamespaceAlreadyExistsError as e:
pass
df = pa.Table.from_pylist(
[
{"lat": 52.371807, "long": 4.896029},
{"lat": 52.387386, "long": 4.646219},
{"lat": 52.078663, "long": 4.288788},
],
)
schema = df.schema
table_name = "coordinates"
table_identifier = f"{namespace}.{table_name}"
try:
table = catalog.create_table(
identifier=table_identifier,
schema=schema,
)
except TableAlreadyExistsError as e:
pass
table = catalog.load_table(table_identifier)
table.append(df)The Traceback
Traceback (most recent call last):
File "d:\flink_iceberg\poc_01_iceberg_rest.py", line 40, in <module>
table.append(df)
File "D:\flink_iceberg\.venv2\Lib\site-packages\pyiceberg\table\__init__.py", line 1068, in append
for data_file in data_files:
File "D:\flink_iceberg\.venv2\Lib\site-packages\pyiceberg\table\__init__.py", line 2423, in _dataframe_to_data_files
yield from write_file(table, iter([WriteTask(write_uuid, next(counter), df)]))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\flink_iceberg\.venv2\Lib\site-packages\pyiceberg\io\pyarrow.py", line 1726, in write_file
with fo.create(overwrite=True) as fos:
^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\flink_iceberg\.venv2\Lib\site-packages\pyiceberg\io\pyarrow.py", line 299, in create
output_file = self._filesystem.open_output_stream(self._path, buffer_size=self._buffer_size)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "pyarrow\_fs.pyx", line 868, in pyarrow._fs.FileSystem.open_output_stream
File "pyarrow\error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow\error.pxi", line 115, in pyarrow.lib.check_status
OSError: When initiating multiple part upload for key 'poc_new/coordinates/data/00000-0-efc0be57-453d-442d-af13-2e0b2382a53d.parquet' in bucket 'warehouse': AWS Error NETWORK_CONNECTION during CreateMultipartUpload operation: Encountered network error when sending http request
In Minio, the metadata directory is created and it stores the metadata.json file, but, no data directory.

Also, this is the requirements.txt file
annotated-types==0.7.0
apache-beam==2.48.0
apache-flink==1.19.1
apache-flink-libraries==1.19.1
avro-python3==1.10.2
certifi==2024.7.4
charset-normalizer==3.3.2
click==8.1.7
cloudpickle==2.2.1
colorama==0.4.6
confluent-kafka==2.5.0
crcmod==1.7
dill==0.3.1.1
dnspython==2.6.1
docopt==0.6.2
duckdb==0.9.2
duckdb_engine==0.13.0
Faker==26.0.0
fastavro==1.9.5
fasteners==0.19
fsspec==2023.12.2
greenlet==3.0.3
grpcio==1.65.1
hdfs==2.7.3
httplib2==0.22.0
idna==3.7
kafka-python==2.0.2
markdown-it-py==3.0.0
mdurl==0.1.2
mmhash3==3.0.1
numpy==1.24.4
objsize==0.6.1
orjson==3.10.6
packaging==24.1
pandas==2.2.2
polars==1.2.1
proto-plus==1.24.0
protobuf==4.23.4
py4j==0.10.9.7
pyarrow==11.0.0
pydantic==2.8.2
pydantic-settings==2.3.4
pydantic_core==2.20.1
pydot==1.4.2
Pygments==2.18.0
pyiceberg==0.6.1
pymongo==4.8.0
pyparsing==3.1.2
python-dateutil==2.9.0.post0
python-dotenv==1.0.1
pytz==2024.1
regex==2024.7.24
requests==2.32.3
rich==13.7.1
ruamel.yaml==0.18.6
ruamel.yaml.clib==0.2.8
six==1.16.0
sortedcontainers==2.4.0
SQLAlchemy==2.0.31
strictyaml==1.7.3
typing_extensions==4.12.2
tzdata==2024.1
urllib3==2.2.2
zstandard==0.23.0
I checked this Slack thread for the same issue, but, it doesn't contain any fix for my case.
OS: Windows 10
environment variables contain aws in the three containers
iceberg-rest container
iceberg@ce79d3f11b5f:/usr/lib/iceberg-rest$ env | grep -i aws
AWS_REGION=us-east-1
CATALOG_IO__IMPL=org.apache.iceberg.aws.s3.S3FileIO
AWS_SECRET_ACCESS_KEY=password
AWS_ACCESS_KEY_ID=admin
minio container, doesn't have any ENV with aws
mc container
AWS_REGION=us-east-1
AWS_SECRET_ACCESS_KEY=password
AWS_ACCESS_KEY_ID=admin