ImportArguments#
- class ImportArguments#
Container class for holding arguments for partitioning input data into a HATS catalog.
Attributes
add_healpix_29add the healpix-based hats spatial index field alongside the data
addl_hats_propertiesAny additional keyword arguments you would like to provide when writing the hats.properties file for the final HATS table.
allowed_catalog_typespossible types of catalog to import with ImportArguments
byte_pixel_thresholdwhen determining bins for the final partitioning, the maximum number of rows for a single resulting pixel, expressed in bytes.
catalog_pathconstructed output path for the catalog that will be something like <output_path>/<output_artifact_name>
catalog_typelevel of catalog data, object (things in the sky) or source (detections)
completion_email_addressif provided, send an email to the indicated email address once the import pipeline has completed.
constant_healpix_orderhealpix order to use when mapping.
create_metadataCreate /dataset/_metadata parquet from all data partitions.
create_per_partition_statsCreate per_partition_statistics.parquet, based on footers from all data partitions.
create_thumbnailCreate /dataset/data_thumbnail.parquet from one row of each data partition.
dask_n_workersnumber of workers for the dask client
dask_threads_per_workernumber of threads per dask worker
dask_tmpdirectory for dask worker space.
debug_stats_onlydo not perform a map reduce and don't create a new catalog.
dec_columncolumn for declination
delete_intermediate_parquet_filesshould we delete the smaller intermediate parquet files generated in the splitting stage, once the relevant reducing stage is complete?
delete_resume_log_filesshould we delete task-level done files once each stage is complete? if False, we will keep all done marker files at the end of the pipeline.
drop_empty_siblingswhen determining bins for the final partitioning, should we keep result pixels at a higher order (smaller area) if the 3 sibling pixels are empty.
existing_pixelsthe list of HEALPix pixels to include in the alignment
expected_total_rowsnumber of expected rows found in the dataset.
file_readerinstance of input reader that specifies arguments necessary for reading from your input files
highest_healpix_orderhealpix order to use when mapping.
input_pathpath to search for the input data
lowest_healpix_orderwhen determining bins for the final partitioning, the lowest possible healpix order for resulting pixels.
mapping_healpix_orderhealpix order to use when mapping.
npix_parquet_nameName of the pixel parquet file to be used when npix_suffix=/.
npix_suffixSuffix for pixel data.
output_artifact_nameshort, convenient name for the catalog
output_pathbase path where new catalog should be output
pixel_thresholdwhen determining bins for the final partitioning, the maximum number of rows for a single resulting pixel.
progress_barif true, a progress bar will be displayed for user feedback of map reduce progress
ra_columncolumn for right ascension
resumeIf True, we try to read any existing intermediate files and continue to run the pipeline where we left off.
resume_tmpdirectory for intermediate resume files, when needed.
row_group_kwargsadditional keyword arguments to use in creation of rowgroups when writing files to parquet.
should_write_skymapmain catalogs should contain skymap fits files
simple_progress_barif displaying a progress bar, use a text-only simple progress bar instead of widget.
skymap_alt_ordersAdditional alternative healpix orders to write a HEALPix skymap.
sort_columnscolumn for survey identifier, or other sortable column.
tmp_base_patheither tmp_dir or dask_dir, if those were provided by the user
tmp_dirpath for storing intermediate files
tmp_pathconstructed temp path - defaults to tmp_dir, then dask_tmp, but will create a new temp directory under catalog_path if no other options are provided
tqdm_kwargsAdditional arguments to pass to the tqdm progress bar.
use_healpix_29use an existing healpix-based hats spatial index as the position, instead of ra/dec
use_schema_filepath to a parquet file with schema metadata.
write_table_kwargsadditional keyword arguments to use when writing files to parquet (e.g. compression schemes).
input_file_listcan be used instead of input_path to import only specified files
input_pathsresolved list of all files that will be used in the importer
run_stageslist of parallel stages to run.
Methods
__init__(output_path, output_artifact_name, ...)extra_property_dict()Generate additional HATS properties for this import run as a dictionary.
reimport_from_hats(path, output_dir, **kwargs)Generate the import arguments to reimport a HATS catalog with different parameters
resume_kwargs_dict()Convenience method to convert fields for resume functionality.
to_table_properties(total_rows, ...[, ...])Catalog-type-specific dataset info.
- __init__(output_path: str | Path | UPath | None = None, output_artifact_name: str = '', addl_hats_properties: dict | None = None, npix_suffix: str = '.parquet', npix_parquet_name: str | None = None, write_table_kwargs: dict | None = None, row_group_kwargs: dict | None = None, should_write_skymap: bool = True, skymap_alt_orders: list[int] | None = None, create_thumbnail: bool = False, create_metadata: bool = True, create_per_partition_stats: bool = False, tmp_dir: str | Path | UPath | None = None, resume: bool = True, progress_bar: bool = True, simple_progress_bar: bool = False, tqdm_kwargs: dict | None = None, dask_tmp: str | Path | UPath | None = None, dask_n_workers: int = 1, dask_threads_per_worker: int = 1, resume_tmp: str | Path | UPath | None = None, delete_intermediate_parquet_files: bool = True, delete_resume_log_files: bool = True, completion_email_address: str = '', catalog_path: UPath | None = None, tmp_path: UPath | None = None, tmp_base_path: UPath | None = None, catalog_type: str = 'object', allowed_catalog_types: tuple[str] = ('source', 'object', 'map'), input_path: str | Path | UPath | None = None, input_file_list: list[str | Path | UPath] = <factory>, input_paths: list[str | Path | UPath] = <factory>, ra_column: str = 'ra', dec_column: str = 'dec', use_healpix_29: bool = False, sort_columns: str | None = None, add_healpix_29: bool = True, use_schema_file: str | Path | UPath | None = None, expected_total_rows: int = 0, constant_healpix_order: int = -1, lowest_healpix_order: int = 0, highest_healpix_order: int = 10, pixel_threshold: int = 1000000, byte_pixel_threshold: int | None = None, drop_empty_siblings: bool = True, mapping_healpix_order: int = -1, run_stages: list[str] = <factory>, debug_stats_only: bool = False, file_reader: InputReader | str | None = None, existing_pixels: Sequence[tuple[int, int]] | None=None) None#
- classmethod __new__(*args, **kwargs)#