CsvReader#

class CsvReader#

CSV reader for the most common CSV reading arguments.

This uses pandas.read_csv, and you can find more information on additional arguments in the pandas documentation: https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html

chunksize#

number of rows to read in a single iteration.

Type:

int

header#

rows to use as the header with column names

Type:

int, list of int, None, default ‘infer’

schema_file#

path to a parquet schema file. if provided, header names and column types will be pulled from the parquet schema metadata.

Type:

str

column_names#

the names of columns if no header is available

Type:

list[str]

type_map#

the data types to use for columns

Type:

dict

parquet_kwargs#

additional keyword arguments to use when reading the parquet schema metadata, passed to pandas.read_parquet. See https://pandas.pydata.org/docs/reference/api/pandas.read_parquet.html

Type:

dict

kwargs#

additional keyword arguments to use when reading the CSV files with pandas.read_csv. See https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html

Type:

dict

Methods

__init__([chunksize, header, schema_file, ...])

read(input_file[, read_columns])

Read the input file, or chunk of the input file.

read_index_file(input_file[, upath_kwargs])

Read an "indexed" file.

regular_file_exists(input_file, **_kwargs)

Check that the input_file points to a single regular file

__init__(chunksize=500000, header='infer', schema_file=None, column_names=None, type_map=None, parquet_kwargs=None, upath_kwargs=None, **kwargs)#
classmethod __new__(*args, **kwargs)#