CollectionArguments#
- class CollectionArguments#
Container class for building arguments for importing catalog as a collection.
Attributes
addl_hats_propertiesAny additional keyword arguments you would like to provide when writing the hats.properties file for the final HATS table.
catalog_argsConstructed arguments for catalog import.
catalog_pathconstructed output path for the catalog that will be something like <output_path>/<output_artifact_name>
completion_email_addressif provided, send an email to the indicated email address once the import pipeline has completed.
create_metadataCreate /dataset/_metadata parquet from all data partitions.
create_per_partition_statsCreate per_partition_statistics.parquet, based on footers from all data partitions.
create_thumbnailCreate /dataset/data_thumbnail.parquet from one row of each data partition.
dask_n_workersnumber of workers for the dask client
dask_threads_per_workernumber of threads per dask worker
dask_tmpdirectory for dask worker space.
default_margin_namedelete_intermediate_parquet_filesshould we delete the smaller intermediate parquet files generated in the splitting stage, once the relevant reducing stage is complete?
delete_resume_log_filesshould we delete task-level done files once each stage is complete? if False, we will keep all done marker files at the end of the pipeline.
new_catalog_nameName for the new catalog that will be created.
new_catalog_pathConstructed path for the new catalog, relative to the collection.
npix_parquet_nameName of the pixel parquet file to be used when npix_suffix=/.
npix_suffixSuffix for pixel data.
output_artifact_nameshort, convenient name for the catalog
output_pathbase path where new catalog should be output
progress_barif true, a progress bar will be displayed for user feedback of map reduce progress
resumeIf True, we try to read any existing intermediate files and continue to run the pipeline where we left off.
resume_tmpdirectory for intermediate resume files, when needed.
row_group_kwargsadditional keyword arguments to use in creation of rowgroups when writing files to parquet.
should_write_skymapmain catalogs should contain skymap fits files
simple_progress_barif displaying a progress bar, use a text-only simple progress bar instead of widget.
skymap_alt_ordersAdditional alternative healpix orders to write a HEALPix skymap.
tmp_base_patheither tmp_dir or dask_dir, if those were provided by the user
tmp_dirpath for storing intermediate files
tmp_pathconstructed temp path - defaults to tmp_dir, then dask_tmp, but will create a new temp directory under catalog_path if no other options are provided
tqdm_kwargsAdditional arguments to pass to the tqdm progress bar.
write_table_kwargsadditional keyword arguments to use when writing files to parquet (e.g. compression schemes).
margin_kwargsList of all argument dictionaries passed to this builder, for creating margins.
index_kwargsList of all argument dictionaries passed to this builder, for creating indexes.
margin_argsConstructed arguments for creating margins.
margin_pathsPaths to margins that may be created by these arguments
index_argsConstructed arguments for creating indexes.
index_pathsPaths to indexes that may be created by these arguments
Methods
__init__(output_path, output_artifact_name, ...)add_index(**kwargs)Add arguments for an index catalog.
add_margin([is_default])Add arguments for a margin catalog.
catalog(**kwargs)Set the primary catalog for the collection.
extra_property_dict()Generate additional HATS properties for this import run as a dictionary.
get_catalog_args()Retrieve the catalog arguments, if a catalog must be created or resumed.
get_index_args()Construct and return the index argument objects, validating the inputs.
get_margin_args()Construct and return the margin argument objects, validating the inputs.
resume_kwargs_dict()Convenience method to convert fields for resume functionality.
to_collection_properties()Collection-specific dataset info.
- __init__(output_path: str | Path | UPath | None = None, output_artifact_name: str = '', addl_hats_properties: dict | None = None, npix_suffix: str = '.parquet', npix_parquet_name: str | None = None, write_table_kwargs: dict | None = None, row_group_kwargs: dict | None = None, should_write_skymap: bool = True, skymap_alt_orders: list[int] | None = None, create_thumbnail: bool = False, create_metadata: bool = True, create_per_partition_stats: bool = False, tmp_dir: str | Path | UPath | None = None, resume: bool = True, progress_bar: bool = True, simple_progress_bar: bool = False, tqdm_kwargs: dict | None = None, dask_tmp: str | Path | UPath | None = None, dask_n_workers: int = 1, dask_threads_per_worker: int = 1, resume_tmp: str | Path | UPath | None = None, delete_intermediate_parquet_files: bool = True, delete_resume_log_files: bool = True, completion_email_address: str = '', catalog_path: UPath | None = None, tmp_path: UPath | None = None, tmp_base_path: UPath | None = None, new_catalog_path: UPath | None = None, new_catalog_name: str | None = None, catalog_args: ImportArguments | None = None, margin_kwargs: list[dict] = <factory>, default_margin_name: str | None = None, index_kwargs: list[dict] = <factory>, margin_args: list[dict] = <factory>, margin_paths: list[str] = <factory>, index_args: list[dict] = <factory>, index_paths: dict[str, str]=<factory>) None#
- classmethod __new__(*args, **kwargs)#