ParquetPyarrowReader#

class ParquetPyarrowReader#

Parquet reader that uses the pyarrow library for reading.

Reads file as a pyarrow.Table.

chunksize#

number of rows of the file to process at once. For large files, this can prevent loading the entire file into memory at once.

Type:

int

column_names#

Names of columns to use from the input dataset. If None, use all columns.

Type:

list[str] or None

iterate_by_row_groups#

whether to read the file by row groups.

Type:

bool

kwargs#

arguments to pass along to pyarrow.parquet.ParquetFile. See https://arrow.apache.org/docs/python/generated/pyarrow.parquet.ParquetFile.html

Methods

__init__([chunksize, column_names, ...])

read(input_file[, read_columns])

Read the input file, or chunk of the input file.

read_index_file(input_file[, upath_kwargs])

Read an "indexed" file.

regular_file_exists(input_file, **_kwargs)

Check that the input_file points to a single regular file

__init__(chunksize=500000, column_names=None, iterate_by_row_groups=False, **kwargs)#
classmethod __new__(*args, **kwargs)#