hipscat_import.runtime_arguments

`hipscat_import.runtime_arguments`#

Data class to hold common runtime arguments for dataset creation.

Module Contents#

Classes#

RuntimeArguments

Data class for holding runtime arguments

Functions#

find_input_paths([input_path, file_matcher, ...])

Helper method to find input paths, given either a prefix and format, or an

class RuntimeArguments[source]#

Data class for holding runtime arguments

output_path: str = ''[source]#: base path where new catalog should be output

output_artifact_name: str = ''[source]#: short, convenient name for the catalog

output_storage_options: Dict[Any, Any] | None[source]#: optional dictionary of abstract filesystem credentials for the OUTPUT.

tmp_dir: str = ''[source]#: path for storing intermediate files

resume: bool = True[source]#: If True, we try to read any existing intermediate files and continue to run the pipeline where we left off. If False, we start the import from scratch, overwriting any content of the output directory.

progress_bar: bool = True[source]#: if true, a tqdm progress bar will be displayed for user feedback of map reduce progress

dask_tmp: str = ''[source]#: directory for dask worker space. this should be local to the execution of the pipeline, for speed of reads and writes

dask_n_workers: int = 1[source]#: number of workers for the dask client

dask_threads_per_worker: int = 1[source]#: number of threads per dask worker

resume_tmp: str = ''[source]#: directory for intermediate resume files, when needed. see RTD for more info.

completion_email_address: str = ''[source]#: if provided, send an email to the indicated email address once the import pipeline has complete.

catalog_path: hipscat.io.FilePointer | None[source]#: constructed output path for the catalog that will be something like <output_path>/<output_artifact_name>

tmp_path: hipscat.io.FilePointer | None[source]#: constructed temp path - defaults to tmp_dir, then dask_tmp, but will create a new temp directory under catalog_path if no other options are provided

__post_init__()[source]#

_check_arguments()[source]#

provenance_info() → dict[source]#

Fill all known information in a dictionary for provenance tracking.

Returns:: dictionary with all argument_name -> argument_value as key -> value pairs.

additional_runtime_provenance_info()[source]#: Any additional runtime args to be included in provenance info from subclasses

find_input_paths(input_path='', file_matcher='', input_file_list=None, storage_options: Dict[Any, Any] | None = None)[source]#

Helper method to find input paths, given either a prefix and format, or an explicit list of paths.

Parameters:

input_path (str) – prefix to search for
file_matcher (str) – matcher to use when searching for files
input_file_list (List[str]) – list of input paths

Returns:

matching files, if input_path is provided, otherwise, input_file_list

Raises:

FileNotFoundError – if no files are found at the input_path and the provided list is empty.