ImportArguments#

class ImportArguments#

Container class for holding arguments for partitioning input data into a HATS catalog.

Attributes

add_healpix_29

add the healpix-based hats spatial index field alongside the data

addl_hats_properties

Any additional keyword arguments you would like to provide when writing the hats.properties file for the final HATS table.

allowed_catalog_types

possible types of catalog to import with ImportArguments

byte_pixel_threshold

when determining bins for the final partitioning, the maximum number of rows for a single resulting pixel, expressed in bytes.

catalog_path

constructed output path for the catalog that will be something like <output_path>/<output_artifact_name>

catalog_type

level of catalog data, object (things in the sky) or source (detections)

completion_email_address

if provided, send an email to the indicated email address once the import pipeline has completed.

constant_healpix_order

healpix order to use when mapping.

create_metadata

Create /dataset/_metadata parquet from all data partitions.

create_per_partition_stats

Create per_partition_statistics.parquet, based on footers from all data partitions.

create_thumbnail

Create /dataset/data_thumbnail.parquet from one row of each data partition.

dask_n_workers

number of workers for the dask client

dask_threads_per_worker

number of threads per dask worker

dask_tmp

directory for dask worker space.

debug_stats_only

do not perform a map reduce and don't create a new catalog.

dec_column

column for declination

delete_intermediate_parquet_files

should we delete the smaller intermediate parquet files generated in the splitting stage, once the relevant reducing stage is complete?

delete_resume_log_files

should we delete task-level done files once each stage is complete? if False, we will keep all done marker files at the end of the pipeline.

drop_empty_siblings

when determining bins for the final partitioning, should we keep result pixels at a higher order (smaller area) if the 3 sibling pixels are empty.

existing_pixels

the list of HEALPix pixels to include in the alignment

expected_total_rows

number of expected rows found in the dataset.

file_reader

instance of input reader that specifies arguments necessary for reading from your input files

highest_healpix_order

healpix order to use when mapping.

input_path

path to search for the input data

lowest_healpix_order

when determining bins for the final partitioning, the lowest possible healpix order for resulting pixels.

mapping_healpix_order

healpix order to use when mapping.

npix_parquet_name

Name of the pixel parquet file to be used when npix_suffix=/.

npix_suffix

Suffix for pixel data.

output_artifact_name

short, convenient name for the catalog

output_path

base path where new catalog should be output

pixel_threshold

when determining bins for the final partitioning, the maximum number of rows for a single resulting pixel.

progress_bar

if true, a progress bar will be displayed for user feedback of map reduce progress

ra_column

column for right ascension

resume

If True, we try to read any existing intermediate files and continue to run the pipeline where we left off.

resume_tmp

directory for intermediate resume files, when needed.

row_group_kwargs

additional keyword arguments to use in creation of rowgroups when writing files to parquet.

should_write_skymap

main catalogs should contain skymap fits files

simple_progress_bar

if displaying a progress bar, use a text-only simple progress bar instead of widget.

skymap_alt_orders

Additional alternative healpix orders to write a HEALPix skymap.

sort_columns

column for survey identifier, or other sortable column.

tmp_base_path

either tmp_dir or dask_dir, if those were provided by the user

tmp_dir

path for storing intermediate files

tmp_path

constructed temp path - defaults to tmp_dir, then dask_tmp, but will create a new temp directory under catalog_path if no other options are provided

tqdm_kwargs

Additional arguments to pass to the tqdm progress bar.

use_healpix_29

use an existing healpix-based hats spatial index as the position, instead of ra/dec

use_schema_file

path to a parquet file with schema metadata.

write_table_kwargs

additional keyword arguments to use when writing files to parquet (e.g. compression schemes).

input_file_list

can be used instead of input_path to import only specified files

input_paths

resolved list of all files that will be used in the importer

run_stages

list of parallel stages to run.

Methods

__init__(output_path, output_artifact_name, ...)

extra_property_dict()

Generate additional HATS properties for this import run as a dictionary.

reimport_from_hats(path, output_dir, **kwargs)

Generate the import arguments to reimport a HATS catalog with different parameters

resume_kwargs_dict()

Convenience method to convert fields for resume functionality.

to_table_properties(total_rows, ...[, ...])

Catalog-type-specific dataset info.

__init__(output_path: str | Path | UPath | None = None, output_artifact_name: str = '', addl_hats_properties: dict | None = None, npix_suffix: str = '.parquet', npix_parquet_name: str | None = None, write_table_kwargs: dict | None = None, row_group_kwargs: dict | None = None, should_write_skymap: bool = True, skymap_alt_orders: list[int] | None = None, create_thumbnail: bool = False, create_metadata: bool = True, create_per_partition_stats: bool = False, tmp_dir: str | Path | UPath | None = None, resume: bool = True, progress_bar: bool = True, simple_progress_bar: bool = False, tqdm_kwargs: dict | None = None, dask_tmp: str | Path | UPath | None = None, dask_n_workers: int = 1, dask_threads_per_worker: int = 1, resume_tmp: str | Path | UPath | None = None, delete_intermediate_parquet_files: bool = True, delete_resume_log_files: bool = True, completion_email_address: str = '', catalog_path: UPath | None = None, tmp_path: UPath | None = None, tmp_base_path: UPath | None = None, catalog_type: str = 'object', allowed_catalog_types: tuple[str] = ('source', 'object', 'map'), input_path: str | Path | UPath | None = None, input_file_list: list[str | Path | UPath] = <factory>, input_paths: list[str | Path | UPath] = <factory>, ra_column: str = 'ra', dec_column: str = 'dec', use_healpix_29: bool = False, sort_columns: str | None = None, add_healpix_29: bool = True, use_schema_file: str | Path | UPath | None = None, expected_total_rows: int = 0, constant_healpix_order: int = -1, lowest_healpix_order: int = 0, highest_healpix_order: int = 10, pixel_threshold: int = 1000000, byte_pixel_threshold: int | None = None, drop_empty_siblings: bool = True, mapping_healpix_order: int = -1, run_stages: list[str] = <factory>, debug_stats_only: bool = False, file_reader: InputReader | str | None = None, existing_pixels: Sequence[tuple[int, int]] | None=None) None#
classmethod __new__(*args, **kwargs)#