Create columnar index of hipscat table using dask for parallelization
Module Contents
Functions
read_leaf_file (input_file, include_columns, ...)
|
Mapping function called once per input file. |
create_index (args, client)
|
Read primary column, indexing column, and other payload data, |
-
read_leaf_file(input_file, include_columns, include_hipscat_index, drop_duplicates, storage_options)[source]
Mapping function called once per input file.
Reads the leaf parquet file, and returns with appropriate columns and duplicates dropped.
-
create_index(args, client)[source]
Read primary column, indexing column, and other payload data,
and write to catalog directory.