-
Notifications
You must be signed in to change notification settings - Fork 133
Open
Labels
bugA bug issueA bug issue
Description
What happened?
I have this vortex array:
Take is reshuffle of the array - all elements are used, just on a different position
When I try to write this array as a file, it explodes my memory (using more than 200 Gb even though Array takes just 500 Mb)
Steps to reproduce
use std::fs::File;
use arrow_ipc::reader::FileReader;
use rand::rng;
use rand::seq::SliceRandom;
use vortex::VortexSessionDefault;
use vortex::array::Array;
use vortex::array::ArrayRef;
use vortex::array::arrays::PrimitiveArray;
use vortex::array::arrow::FromArrowArray;
use vortex::array::iter::{ArrayIteratorAdapter, ArrayIteratorExt};
use vortex::array::stream::ArrayStreamExt;
use vortex::array::validity::Validity;
use vortex::buffer::Buffer;
use vortex::dtype::DType;
use vortex::dtype::arrow::FromArrowType;
use vortex::file::WriteOptionsSessionExt;
use vortex::io::session::RuntimeSessionExt;
use vortex::session::VortexSession;
#[tokio::main]
async fn main() -> anyhow::Result<()> {
let session = VortexSession::default().with_tokio();
let file = File::open("gharchive_2023010100.arrow")?;
let reader = FileReader::try_new(file, None)?;
let dtype = DType::from_arrow(reader.schema());
let iter = ArrayIteratorAdapter::new(
dtype,
reader.map(|batch| Ok(ArrayRef::from_arrow(&batch?, false)?)),
);
let array = iter.into_array_stream().read_all().await?;
let len = array.len();
let mut indices: Vec<u64> = (0..len as u64).collect();
indices.shuffle(&mut rng());
let indices_array = PrimitiveArray::new(Buffer::from(indices), Validity::NonNullable);
let array = array.take(indices_array.to_array())?;
let stream = array.to_array_stream();
session
.write_options()
.write(
&mut async_fs::File::create(
"/Users/blaginin/Documents/playground/gharchive_2023010100.vortex",
)
.await?,
stream,
)
.await?;
Ok(())
}Environment
Latest develop
Additional context
No response
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugA bug issueA bug issue