hipscat_import.soap.map_reduce
#
Inner methods for SOAP
Module Contents#
Functions#
|
|
|
Build a nice dataframe with pretty columns and rows |
|
Count the number of equijoined sources in the object pixels. |
|
Combine many partial CSVs into single partition join info. |
|
Reduce join tables into one parquet file per object-pixel, with one row-group |
- _write_count_results(cache_path, source_healpix, results)[source]#
Build a nice dataframe with pretty columns and rows
- count_joins(soap_args: hipscat_import.soap.arguments.SoapArguments, source_pixel: hipscat.pixel_math.healpix_pixel.HealpixPixel, object_pixels: List[hipscat.pixel_math.healpix_pixel.HealpixPixel])[source]#
Count the number of equijoined sources in the object pixels. If any un-joined source pixels remain, stretch out to neighboring object pixels.
- Parameters:
soap_args (hipscat_import.soap.SoapArguments) – set of arguments for pipeline execution
source_pixel (HealpixPixel) – order and pixel for the source catalog single pixel.
object_pixels (List[HealpixPixel]) – set of tuples of order and pixel for the partitions of the object catalog to be joined.
- combine_partial_results(input_path, output_path, output_storage_options) int [source]#
Combine many partial CSVs into single partition join info. Also write out a debug file with counts of unmatched sources, if any.
- Parameters:
input_path (str) – intermediate directory with partial result CSVs. likely, the directory used in the previous count_joins call as cache_path
output_path (str) – directory to write the combined results CSVs.
- Returns:
integer that is the sum of all matched num_rows.
- reduce_joins(soap_args: hipscat_import.soap.arguments.SoapArguments, object_pixel: hipscat.pixel_math.healpix_pixel.HealpixPixel, object_key: str, delete_input_files: bool = True)[source]#
Reduce join tables into one parquet file per object-pixel, with one row-group inside per source pixel.