ParquetPandasReader#
- class ParquetPandasReader#
Parquet reader for the most common Parquet reading arguments.
Reads input file as a pandas.DataFrame.
- chunksize#
number of rows of the file to process at once. For large files, this can prevent loading the entire file into memory at once.
- Type:
int
- column_names#
Names of columns to use from the input dataset. If None, use all columns.
- Type:
list[str] or None
- iterate_by_row_groups#
whether to read the file by row groups.
- Type:
bool
- kwargs#
arguments to pass along to pyarrow.parquet.ParquetFile. See https://arrow.apache.org/docs/python/generated/pyarrow.parquet.ParquetFile.html
Methods
__init__([chunksize, column_names, ...])read(input_file[, read_columns])Read the input file, or chunk of the input file.
read_index_file(input_file[, upath_kwargs])Read an "indexed" file.
regular_file_exists(input_file, **_kwargs)Check that the input_file points to a single regular file
- __init__(chunksize=500000, column_names=None, iterate_by_row_groups=False, **kwargs)#
- classmethod __new__(*args, **kwargs)#