hipscat_import.soap.map_reduce#

Inner methods for SOAP

Module Contents#

Functions#

_count_joins_for_object(source_data, source_pixel, ...)

_write_count_results(cache_path, source_healpix, results)

Build a nice dataframe with pretty columns and rows

count_joins(soap_args, source_pixel, object_pixels)

Count the number of equijoined sources in the object pixels.

combine_partial_results(→ int)

Combine many partial CSVs into single partition join info.

reduce_joins(soap_args, object_pixel, object_key[, ...])

Reduce join tables into one parquet file per object-pixel, with one row-group

_count_joins_for_object(source_data, source_pixel, object_pixel, soap_args)[source]#
_write_count_results(cache_path, source_healpix, results)[source]#

Build a nice dataframe with pretty columns and rows

count_joins(soap_args: hipscat_import.soap.arguments.SoapArguments, source_pixel: hipscat.pixel_math.healpix_pixel.HealpixPixel, object_pixels: List[hipscat.pixel_math.healpix_pixel.HealpixPixel])[source]#

Count the number of equijoined sources in the object pixels. If any un-joined source pixels remain, stretch out to neighboring object pixels.

Parameters:
  • soap_args (hipscat_import.soap.SoapArguments) – set of arguments for pipeline execution

  • source_pixel (HealpixPixel) – order and pixel for the source catalog single pixel.

  • object_pixels (List[HealpixPixel]) – set of tuples of order and pixel for the partitions of the object catalog to be joined.

combine_partial_results(input_path, output_path, output_storage_options) int[source]#

Combine many partial CSVs into single partition join info. Also write out a debug file with counts of unmatched sources, if any.

Parameters:
  • input_path (str) – intermediate directory with partial result CSVs. likely, the directory used in the previous count_joins call as cache_path

  • output_path (str) – directory to write the combined results CSVs.

Returns:

integer that is the sum of all matched num_rows.

reduce_joins(soap_args: hipscat_import.soap.arguments.SoapArguments, object_pixel: hipscat.pixel_math.healpix_pixel.HealpixPixel, object_key: str, delete_input_files: bool = True)[source]#

Reduce join tables into one parquet file per object-pixel, with one row-group inside per source pixel.