ParquetPyarrowReader#

class ParquetPyarrowReader#

Parquet reader that uses the pyarrow library for reading.

Reads file as a pyarrow.Table.

chunksize#

number of rows of the file to process at once. For large files, this can prevent loading the entire file into memory at once.

column_names#

Names of columns to use from the input dataset. If None, use all columns.

iterate_by_row_groups#

whether to read the file by row groups.

kwargs#: arguments to pass along to pyarrow.parquet.ParquetFile. See https://arrow.apache.org/docs/python/generated/pyarrow.parquet.ParquetFile.html

Methods

`__init__`([chunksize, column_names, ...])
`read`(input_file[, read_columns])	Read the input file, or chunk of the input file.
`read_index_file`(input_file[, upath_kwargs])	Read an "indexed" file.
`regular_file_exists`(input_file, **_kwargs)	Check that the input_file points to a single regular file

__init__(chunksize=500000, column_names=None, iterate_by_row_groups=False, **kwargs)#

ParquetPyarrowReader