hipscat_import.index.map_reduce

hipscat_import.index.map_reduce#

Create columnar index of hipscat table using dask for parallelization

Module Contents#

Functions#

read_leaf_file(input_file, include_columns, ...)

Mapping function called once per input file.

create_index(args, client)

Read primary column, indexing column, and other payload data,

read_leaf_file(input_file, include_columns, include_hipscat_index, drop_duplicates, storage_options)[source]#

Mapping function called once per input file.

Reads the leaf parquet file, and returns with appropriate columns and duplicates dropped.

create_index(args, client)[source]#

Read primary column, indexing column, and other payload data, and write to catalog directory.