PanStarrs#
Getting the data#
We had a special line to the folks at NASA to get a hold of the full object catalog, and the detections table. This is provided for reference.
Challenges with this data set#
The rows are wide, so the chunked reader cannot read too many rows at once.
The CSV files don’t have a header, so we need to provide the column names and type hints to the reader.
The tables are very wide. We only used a subset of columns in each table for our initial science use cases.
You can download the CSV files we used that contain python type information:
Example import of objects (otmo)#
import pandas as pd
import hipscat_import.pipeline as runner
from hipscat_import.catalog.arguments import ImportArguments
from hipscat_import.catalog.file_readers import CsvReader
# Load the column names and types from a side file.
type_frame = pd.read_csv("ps1_otmo_types.csv")
type_map = dict(zip(type_frame["name"], type_frame["type"]))
type_names = type_frame["name"].values.tolist()
in_file_paths = glob.glob("/path/to/otmo/OTMO_**.csv")
in_file_paths.sort()
args = ImportArguments(
output_artifact_name="ps1_otmo",
input_file_list=in_file_paths,
file_reader=CsvReader(
header=None,
index_col=False,
column_names=type_names,
type_map=type_map,
chunksize=250_000,
usecols=use_columns,
),
ra_column="raMean",
dec_column="decMean",
sort_columns="objID",
)
runner.pipeline(args)
Example import of detections#
# Load the column names and types from a side file.
type_frame = pd.read_csv("ps1_detections_types.csv")
type_map = dict(zip(type_frame["name"], type_frame["type"]))
type_names = type_frame["name"].values.tolist()
in_file_paths = glob.glob("/path/to/detection/detection**.csv")
in_file_paths.sort()
args = ImportArguments(
output_artifact_name="ps1_detection",
input_file_list=in_file_paths,
file_reader=CsvReader(
header=None,
index_col=False,
column_names=type_names,
type_map=type_map,
chunksize=250_000,
usecols=use_columns,
),
ra_column="ra",
dec_column="dec",
sort_columns="objID",
)
runner.pipeline(args)