CsvReader#

class CsvReader#

CSV reader for the most common CSV reading arguments.

This uses pandas.read_csv, and you can find more information on additional arguments in the pandas documentation: https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html

chunksize#

number of rows to read in a single iteration.

header#

rows to use as the header with column names

schema_file#

path to a parquet schema file. if provided, header names and column types will be pulled from the parquet schema metadata.

column_names#

the names of columns if no header is available

type_map#

the data types to use for columns

parquet_kwargs#

additional keyword arguments to use when reading the parquet schema metadata, passed to pandas.read_parquet. See https://pandas.pydata.org/docs/reference/api/pandas.read_parquet.html

kwargs#

additional keyword arguments to use when reading the CSV files with pandas.read_csv. See https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html

Methods

`__init__`([chunksize, header, schema_file, ...])
`read`(input_file[, read_columns])	Read the input file, or chunk of the input file.
`read_index_file`(input_file[, upath_kwargs])	Read an "indexed" file.
`regular_file_exists`(input_file, **_kwargs)	Check that the input_file points to a single regular file

__init__(chunksize=500000, header='infer', schema_file=None, column_names=None, type_map=None, parquet_kwargs=None, upath_kwargs=None, **kwargs)#

CsvReader