PanStarrs#

Getting the data#

We had a special line to the folks at NASA to get a hold of the full object catalog, and the detections table. This is provided for reference.

Challenges with this data set#

  • The rows are wide, so the chunked reader cannot read too many rows at once.

  • The CSV files don’t have a header, so we need to provide the column names and type hints to the reader.

  • The tables are very wide. We only used a subset of columns in each table for our initial science use cases.

You can download the CSV files we used that contain python type information:

Example import of objects (otmo)#

import pandas as pd

import hipscat_import.pipeline as runner
from hipscat_import.catalog.arguments import ImportArguments
from hipscat_import.catalog.file_readers import CsvReader

# Load the column names and types from a side file.
type_frame = pd.read_csv("ps1_otmo_types.csv")
type_map = dict(zip(type_frame["name"], type_frame["type"]))
type_names = type_frame["name"].values.tolist()

in_file_paths = glob.glob("/path/to/otmo/OTMO_**.csv")
in_file_paths.sort()
args = ImportArguments(
    output_artifact_name="ps1_otmo",
    input_file_list=in_file_paths,
    file_reader=CsvReader(
        header=None,
        index_col=False,
        column_names=type_names,
        type_map=type_map,
        chunksize=250_000,
        usecols=use_columns,
    ),
    ra_column="raMean",
    dec_column="decMean",
    sort_columns="objID",
)
runner.pipeline(args)

Example import of detections#

# Load the column names and types from a side file.
type_frame = pd.read_csv("ps1_detections_types.csv")
type_map = dict(zip(type_frame["name"], type_frame["type"]))
type_names = type_frame["name"].values.tolist()

in_file_paths = glob.glob("/path/to/detection/detection**.csv")
in_file_paths.sort()
args = ImportArguments(
    output_artifact_name="ps1_detection",
    input_file_list=in_file_paths,
    file_reader=CsvReader(
        header=None,
        index_col=False,
        column_names=type_names,
        type_map=type_map,
        chunksize=250_000,
        usecols=use_columns,
    ),
    ra_column="ra",
    dec_column="dec",
    sort_columns="objID",
)
runner.pipeline(args)