CollectionArguments#

class CollectionArguments#

Container class for building arguments for importing catalog as a collection.

Attributes

addl_hats_properties

Any additional keyword arguments you would like to provide when writing the hats.properties file for the final HATS table.

catalog_args

Constructed arguments for catalog import.

catalog_path

constructed output path for the catalog that will be something like <output_path>/<output_artifact_name>

completion_email_address

if provided, send an email to the indicated email address once the import pipeline has completed.

create_metadata

Create /dataset/_metadata parquet from all data partitions.

create_per_partition_stats

Create per_partition_statistics.parquet, based on footers from all data partitions.

create_thumbnail

Create /dataset/data_thumbnail.parquet from one row of each data partition.

dask_n_workers

number of workers for the dask client

dask_threads_per_worker

number of threads per dask worker

dask_tmp

directory for dask worker space.

default_margin_name

delete_intermediate_parquet_files

should we delete the smaller intermediate parquet files generated in the splitting stage, once the relevant reducing stage is complete?

delete_resume_log_files

should we delete task-level done files once each stage is complete? if False, we will keep all done marker files at the end of the pipeline.

new_catalog_name

Name for the new catalog that will be created.

new_catalog_path

Constructed path for the new catalog, relative to the collection.

npix_parquet_name

Name of the pixel parquet file to be used when npix_suffix=/.

npix_suffix

Suffix for pixel data.

output_artifact_name

short, convenient name for the catalog

output_path

base path where new catalog should be output

progress_bar

if true, a progress bar will be displayed for user feedback of map reduce progress

resume

If True, we try to read any existing intermediate files and continue to run the pipeline where we left off.

resume_tmp

directory for intermediate resume files, when needed.

row_group_kwargs

additional keyword arguments to use in creation of rowgroups when writing files to parquet.

should_write_skymap

main catalogs should contain skymap fits files

simple_progress_bar

if displaying a progress bar, use a text-only simple progress bar instead of widget.

skymap_alt_orders

Additional alternative healpix orders to write a HEALPix skymap.

tmp_base_path

either tmp_dir or dask_dir, if those were provided by the user

tmp_dir

path for storing intermediate files

tmp_path

constructed temp path - defaults to tmp_dir, then dask_tmp, but will create a new temp directory under catalog_path if no other options are provided

tqdm_kwargs

Additional arguments to pass to the tqdm progress bar.

write_table_kwargs

additional keyword arguments to use when writing files to parquet (e.g. compression schemes).

margin_kwargs

List of all argument dictionaries passed to this builder, for creating margins.

index_kwargs

List of all argument dictionaries passed to this builder, for creating indexes.

margin_args

Constructed arguments for creating margins.

margin_paths

Paths to margins that may be created by these arguments

index_args

Constructed arguments for creating indexes.

index_paths

Paths to indexes that may be created by these arguments

Methods

__init__(output_path, output_artifact_name, ...)

add_index(**kwargs)

Add arguments for an index catalog.

add_margin([is_default])

Add arguments for a margin catalog.

catalog(**kwargs)

Set the primary catalog for the collection.

extra_property_dict()

Generate additional HATS properties for this import run as a dictionary.

get_catalog_args()

Retrieve the catalog arguments, if a catalog must be created or resumed.

get_index_args()

Construct and return the index argument objects, validating the inputs.

get_margin_args()

Construct and return the margin argument objects, validating the inputs.

resume_kwargs_dict()

Convenience method to convert fields for resume functionality.

to_collection_properties()

Collection-specific dataset info.

__init__(output_path: str | Path | UPath | None = None, output_artifact_name: str = '', addl_hats_properties: dict | None = None, npix_suffix: str = '.parquet', npix_parquet_name: str | None = None, write_table_kwargs: dict | None = None, row_group_kwargs: dict | None = None, should_write_skymap: bool = True, skymap_alt_orders: list[int] | None = None, create_thumbnail: bool = False, create_metadata: bool = True, create_per_partition_stats: bool = False, tmp_dir: str | Path | UPath | None = None, resume: bool = True, progress_bar: bool = True, simple_progress_bar: bool = False, tqdm_kwargs: dict | None = None, dask_tmp: str | Path | UPath | None = None, dask_n_workers: int = 1, dask_threads_per_worker: int = 1, resume_tmp: str | Path | UPath | None = None, delete_intermediate_parquet_files: bool = True, delete_resume_log_files: bool = True, completion_email_address: str = '', catalog_path: UPath | None = None, tmp_path: UPath | None = None, tmp_base_path: UPath | None = None, new_catalog_path: UPath | None = None, new_catalog_name: str | None = None, catalog_args: ImportArguments | None = None, margin_kwargs: list[dict] = <factory>, default_margin_name: str | None = None, index_kwargs: list[dict] = <factory>, margin_args: list[dict] = <factory>, margin_paths: list[str] = <factory>, index_args: list[dict] = <factory>, index_paths: dict[str, str]=<factory>) None#
classmethod __new__(*args, **kwargs)#