IndexedParquetReader#
- class IndexedParquetReader#
Reads an index file, containing paths to parquet files to be read and batched
- chunksize#
maximum number of rows to process at once. Large files will be processed in chunks. Small files will be concatenated. Also passed to pyarrow.dataset.Dataset.to_batches as batch_size.
- Type:
int
- batch_readahead#
number of batches to read ahead. Passed to pyarrow.dataset.Dataset.to_batches.
- Type:
int
- fragment_readahead#
number of fragments to read ahead. Passed to pyarrow.dataset.Dataset.to_batches.
- Type:
int
- use_threads#
whether to use multiple threads for reading. Passed to pyarrow.dataset.Dataset.to_batches.
- Type:
bool
- column_names#
Names of columns to use from the input dataset. If None, use all columns.
- Type:
list[str] or None
- kwargs#
additional arguments to pass along to InputReader.read_index_file.
Methods
__init__([chunksize, batch_readahead, ...])read(input_file[, read_columns])Read the input file, or chunk of the input file.
read_index_file(input_file[, upath_kwargs])Read an "indexed" file.
regular_file_exists(input_file, **_kwargs)Check that the input_file points to a single regular file
- __init__(chunksize=500000, batch_readahead=16, fragment_readahead=4, use_threads=True, column_names=None, upath_kwargs=None, **kwargs)#
- classmethod __new__(*args, **kwargs)#