Backport a single catalog into a collection#

In this notebook, we show a method for taking a directory containg a catalog and some supplemental tables, and turn that into a catalog collection.

NB: This does not modify the original catalog, or the supplemental tables.

The first case is for an existing directory in the expected format, while the second gives instructions for moving directories around to get to the desired format.

1. Existing directory in the expected format.#

If your existing data follows the following convention, use this section to create the catalog/collection.properties file within the existing directory structure.

/path/to/catalog/                  # collection_path
├── main_catalog/                  # catalog_subdir
│   ├── dataset/
│   ├── partition_info.csv
│   ├── point_map.fits
│   └── properties
├── id_index/                      # value of index_paths["id"]
│   ├── dataset/
│   └── properties
├── margin_1deg/                   # in margin_paths
│   ├── dataset/
│   ├── partition_info.csv
│   └── properties
└── margin_20deg/                  # in margin_paths
    ├── dataset/
    ├── partition_info.csv
    └── properties
[1]:
## Set these values based on the paths / subdirectory names shown above
## Or SET TO NONE if there's nothing relevant.

collection_path = "/data3/epyc/data3/hats/catalogs/"
catalog_subdir = "main_catalog"
margin_paths = ["margin_1deg", "margin_20deg"]
default_margin = "margin_1deg"
index_paths = {"id": "id_index"}

## This is a human-readable name of the collection, often the survey or data release.
collection_name = "survey_drK"
[2]:
from hats.catalog.dataset.collection_properties import CollectionProperties

info = {"obs_collection": collection_name}
info["hats_primary_table_url"] = catalog_subdir
if margin_paths:
    info["all_margins"] = margin_paths
if default_margin:
    info["default_margin"] = default_margin

if index_paths:
    info["all_indexes"] = index_paths

properties = CollectionProperties(**info)
properties.to_properties_file(collection_path)

2. Inserting collection in place of catalog#

This is a slightly different case.

Here, you have a single catalog, but would like that same URI to point to a catalog collection instead. Once the collection has been inserted, the same URI can be used to either access the catalog, or the single individual object catalog it contains.

Step 2.1. Starting condition:

catalog/
├── dataset/
├── partition_info.csv
├── point_map.fits
└── properties

Step 2.2. Insert a placeholder directory for the catalog:

> mkdir catalog
catalog/
├── catalog/
├── dataset/
├── partition_info.csv
├── point_map.fits
└── properties

Step 2.3. Move the full contents of the catalog into the new placeholder:

> mv * catalog/

Noting that the above will not move the catalog subdirectory, and you’ll get an error like cannot move ‘catalog’ to a subdirectory of itself, ‘./catalog/catalog’, but that’s ok

catalog/
└── catalog/
    ├── dataset
    ├── partition_info.csv
    ├── point_map.fits
    └── properties

Step 2.4. Create collection.properties:

Use the next two cells to create the file with minimal contents.

catalog/
├── collection.properties
└── catalog/
    ├── dataset
    ├── partition_info.csv
    ├── point_map.fits
    └── properties
[3]:
collection_path = "/data3/epyc/data3/hats/catalogs/skymapper/sky_mapper_dr4"
catalog_subdir = "catalog"

## This is a human-readable name of the collection, often the survey or data release.
collection_name = "sky_mapper_dr4"
[4]:
from hats.catalog.dataset.collection_properties import CollectionProperties

properties = CollectionProperties(obs_collection=collection_name, hats_primary_table_url=catalog_subdir)
properties.to_properties_file(collection_path)

3. Check your collection#

Regardless of how you get here, it’s a good idea to check that your catalog collection can be loaded just like any other catalog via LSDB.

[5]:
import lsdb

new_collection = lsdb.read_hats(collection_path)
assert new_collection.hc_collection
[6]:
## if you added any margins, this will be the list of all margins
new_collection.hc_collection.all_margins
[7]:
## if you added any indexes, this will be the map of field -> index table
new_collection.hc_collection.all_indexes