get_file_reader#
- get_file_reader(file_format, chunksize=500000, schema_file=None, column_names=None, skip_column_names=None, type_map=None, **kwargs)#
Get a generator file reader for common file types
Currently supported formats include:
"csv", comma separated values. may also be tab- or pipe-delimited includes .csv.gz and other compressed csv files"fits", flexible image transport system. often used for astropy tables."parquet", compressed columnar data format"ecsv", astropy’s enhanced CSV"indexed_csv", “index” style reader, that accepts a file with a list of csv files that are appended in-memory"indexed_parquet", “index” style reader, that accepts a file with a list of parquet files that are appended in-memory
- Parameters:
file_format (str) – specifier for the file type and extension. If using an
input_pathargument, we will look for files with this string as the extension.chunksize (int) – number of rows to read in a single iteration. for single-file readers, large files are split into batches based on this value. for index-style readers, we read files until we reach this chunksize and create a single batch in-memory.
schema_file (str) – path to a parquet schema file. if provided, header names and column types will be pulled from the parquet schema metadata.
column_names (list[str]) – for CSV files, the names of columns if no header is available. for fits files, a list of columns to keep.
skip_column_names (list[str]) – for fits files, a list of columns to remove.
type_map (dict) – for CSV files, the data types to use for columns
kwargs – additional keyword arguments to pass to the underlying file reader.