ImportArguments#

class ImportArguments#

Container class for holding arguments for partitioning input data into a HATS catalog.

Attributes

`add_healpix_29`	add the healpix-based hats spatial index field alongside the data
`addl_hats_properties`	Any additional keyword arguments you would like to provide when writing the hats.properties file for the final HATS table.
`allowed_catalog_types`	possible types of catalog to import with ImportArguments
`byte_pixel_threshold`	when determining bins for the final partitioning, the maximum number of rows for a single resulting pixel, expressed in bytes.
`catalog_path`	constructed output path for the catalog that will be something like <output_path>/<output_artifact_name>
`catalog_type`	level of catalog data, object (things in the sky) or source (detections)
`completion_email_address`	if provided, send an email to the indicated email address once the import pipeline has completed.
`constant_healpix_order`	healpix order to use when mapping.
`create_metadata`	Create /dataset/_metadata parquet from all data partitions.
`create_per_partition_stats`	Create per_partition_statistics.parquet, based on footers from all data partitions.
`create_thumbnail`	Create /dataset/data_thumbnail.parquet from one row of each data partition.
`dask_n_workers`	number of workers for the dask client
`dask_threads_per_worker`	number of threads per dask worker
`dask_tmp`	directory for dask worker space.
`debug_stats_only`	do not perform a map reduce and don't create a new catalog.
`dec_column`	column for declination
`delete_intermediate_parquet_files`	should we delete the smaller intermediate parquet files generated in the splitting stage, once the relevant reducing stage is complete?
`delete_resume_log_files`	should we delete task-level done files once each stage is complete? if False, we will keep all done marker files at the end of the pipeline.
`drop_empty_siblings`	when determining bins for the final partitioning, should we keep result pixels at a higher order (smaller area) if the 3 sibling pixels are empty.
`existing_pixels`	the list of HEALPix pixels to include in the alignment
`expected_total_rows`	number of expected rows found in the dataset.
`file_reader`	instance of input reader that specifies arguments necessary for reading from your input files
`highest_healpix_order`	healpix order to use when mapping.
`input_path`	path to search for the input data
`lowest_healpix_order`	when determining bins for the final partitioning, the lowest possible healpix order for resulting pixels.
`mapping_healpix_order`	healpix order to use when mapping.
`npix_parquet_name`	Name of the pixel parquet file to be used when npix_suffix=/.
`npix_suffix`	Suffix for pixel data.
`output_artifact_name`	short, convenient name for the catalog
`output_path`	base path where new catalog should be output
`pixel_threshold`	when determining bins for the final partitioning, the maximum number of rows for a single resulting pixel.
`progress_bar`	if true, a progress bar will be displayed for user feedback of map reduce progress
`ra_column`	column for right ascension
`resume`	If True, we try to read any existing intermediate files and continue to run the pipeline where we left off.
`resume_tmp`	directory for intermediate resume files, when needed.
`row_group_kwargs`	additional keyword arguments to use in creation of rowgroups when writing files to parquet.
`should_write_skymap`	main catalogs should contain skymap fits files
`simple_progress_bar`	if displaying a progress bar, use a text-only simple progress bar instead of widget.
`skymap_alt_orders`	Additional alternative healpix orders to write a HEALPix skymap.
`sort_columns`	column for survey identifier, or other sortable column.
`tmp_base_path`	either tmp_dir or dask_dir, if those were provided by the user
`tmp_dir`	path for storing intermediate files
`tmp_path`	constructed temp path - defaults to tmp_dir, then dask_tmp, but will create a new temp directory under catalog_path if no other options are provided
`tqdm_kwargs`	Additional arguments to pass to the tqdm progress bar.
`use_healpix_29`	use an existing healpix-based hats spatial index as the position, instead of ra/dec
`use_schema_file`	path to a parquet file with schema metadata.
`write_table_kwargs`	additional keyword arguments to use when writing files to parquet (e.g. compression schemes).
`input_file_list`	can be used instead of input_path to import only specified files
`input_paths`	resolved list of all files that will be used in the importer
`run_stages`	list of parallel stages to run.

Methods

`__init__`(output_path, output_artifact_name, ...)
`extra_property_dict`()	Generate additional HATS properties for this import run as a dictionary.
`reimport_from_hats`(path, output_dir, **kwargs)	Generate the import arguments to reimport a HATS catalog with different parameters
`resume_kwargs_dict`()	Convenience method to convert fields for resume functionality.
`to_table_properties`(total_rows, ...[, ...])	Catalog-type-specific dataset info.

__init__(output_path: str | Path | UPath | None = None, output_artifact_name: str = '', addl_hats_properties: dict | None = None, npix_suffix: str = '.parquet', npix_parquet_name: str | None = None, write_table_kwargs: dict | None = None, row_group_kwargs: dict | None = None, should_write_skymap: bool = True, skymap_alt_orders: list[int] | None = None, create_thumbnail: bool = False, create_metadata: bool = True, create_per_partition_stats: bool = False, tmp_dir: str | Path | UPath | None = None, resume: bool = True, progress_bar: bool = True, simple_progress_bar: bool = False, tqdm_kwargs: dict | None = None, dask_tmp: str | Path | UPath | None = None, dask_n_workers: int = 1, dask_threads_per_worker: int = 1, resume_tmp: str | Path | UPath | None = None, delete_intermediate_parquet_files: bool = True, delete_resume_log_files: bool = True, completion_email_address: str = '', catalog_path: UPath | None = None, tmp_path: UPath | None = None, tmp_base_path: UPath | None = None, catalog_type: str = 'object', allowed_catalog_types: tuple[str] = ('source', 'object', 'map'), input_path: str | Path | UPath | None = None, input_file_list: list[str | Path | UPath] = <factory>, input_paths: list[str | Path | UPath] = <factory>, ra_column: str = 'ra', dec_column: str = 'dec', use_healpix_29: bool = False, sort_columns: str | None = None, add_healpix_29: bool = True, use_schema_file: str | Path | UPath | None = None, expected_total_rows: int = 0, constant_healpix_order: int = -1, lowest_healpix_order: int = 0, highest_healpix_order: int = 10, pixel_threshold: int = 1000000, byte_pixel_threshold: int | None = None, drop_empty_siblings: bool = True, mapping_healpix_order: int = -1, run_stages: list[str] = <factory>, debug_stats_only: bool = False, file_reader: InputReader | str | None = None, existing_pixels: Sequence[tuple[int, int]] | None=None) → None#

classmethod __new__(*args, **kwargs)#

ImportArguments

Contents

ImportArguments#