Skip to content

datopian/symplectic-ckan-sync

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Symplectic ↔︎ CKAN Synchroniser

A lightweight TypeScript service that ingests research metadata and files from Symplectic Elements, publishes them into a CKAN-based open data portal, and writes CKAN references back to Symplectic so researchers can track publication status.

Features

  • Pull both metadata and attached files from Symplectic Elements via its REST API.
  • Transform Symplectic records into CKAN dataset and resource payloads with sensible defaults (title, abstract, keywords, extras, and file resources).
  • Upsert datasets in CKAN (create new ones or update existing ones by name) and keep resources aligned.
  • Push CKAN identifiers/URLs back into Symplectic metadata fields after each successful sync.
  • Configurable concurrency, change windows (changedSince), dry-run support, and structured logging.

Architecture

Symplectic Elements --> SymplecticClient --> Transform --> CKANClient --> CKAN
       ^                                                          |
       |------------------ postBackToSymplectic -------------------|
Component Responsibility
SymplecticClient Authenticated wrapper around the Symplectic Elements API for fetching records and sending metadata updates back.
transformRecordToDataset Normalises Symplectic metadata/files into CKAN dataset/resource payloads (slugged dataset names, extras, tags).
CKANClient Minimal CKAN action API wrapper for package/resource show/create/update operations.
SyncService Orchestrates the flow: fetch records, upsert datasets/resources, feed CKAN identifiers back to Symplectic, and expose summary metrics.
CLI (src/cli.ts) Loads configuration, parses runtime overrides (--since, --dry-run, --concurrency), and executes the synchronisation.

The service stores CKAN identifiers (dataset id, dataset name, absolute dataset URL, and sync timestamp) back in Symplectic metadata, allowing round-trip status tracking and future delta-based syncs.

Getting started

  1. Install dependencies

    npm install
  2. Configure environment

    Copy .env.example to .env and fill in Symplectic and CKAN credentials; alternatively export the variables in your shell.

    cp .env.example .env
    # edit .env
  3. Run the synchroniser

    npm run sync -- --since=2024-01-01T00:00:00Z --concurrency=6

    Flags:

    • --since=<ISO timestamp> overrides SYMPLECTIC_SINCE for incremental fetches.
    • --dry-run runs transformations and logging without mutating CKAN or Symplectic.
    • --concurrency=<n> controls how many records are processed in parallel (default SYNC_CONCURRENCY).
  4. Build for deployment (optional)

    npm run build

How updates flow back to Symplectic

After each dataset is created or updated in CKAN, the service calls SymplecticClient.updateRecord with a payload resembling:

{
  "metadata": {
    "ckan_dataset_id": "1234-5678",
    "ckan_dataset_name": "ecosystem-services-2024-1234",
    "ckan_dataset_url": "https://data.example.ac.uk/dataset/ecosystem-services-2024-1234",
    "ckan_last_synced": "2024-04-11T09:15:00.000Z"
  }
}

You can remap these fields server-side (via workflow rules) or adapt postBackToSymplectic for bespoke metadata schemas.

Extending and deploying

  • Adjust transformRecordToDataset to map local taxonomy (e.g. dataset groups, licences, custom extras).
  • Plug the service into a scheduler (GitHub Actions, cron, Airflow) that supplies the required environment variables and optionally narrows SYMPLECTIC_SINCE for incremental runs.
  • Wrap the CLI with container tooling by using npm run build and copying the dist/ output into a thin Node.js runtime image.

Testing ideas

  • Mock Symplectic/CKAN endpoints with tools such as WireMock or nock to validate transformations without touching production systems.
  • Validate dataset naming collisions by feeding diverse titles; adjust slugify logic if stricter conformity is required in CKAN.

This repository contains only the synchroniser logic; it does not ship infrastructure-as-code or deployment manifests so you can integrate it with your preferred delivery pipeline.

About

Symplectic ↔︎ CKAN Synchroniser

Resources

License

Stars

Watchers

Forks