A lightweight TypeScript service that ingests research metadata and files from Symplectic Elements, publishes them into a CKAN-based open data portal, and writes CKAN references back to Symplectic so researchers can track publication status.
- Pull both metadata and attached files from Symplectic Elements via its REST API.
- Transform Symplectic records into CKAN dataset and resource payloads with sensible defaults (title, abstract, keywords, extras, and file resources).
- Upsert datasets in CKAN (create new ones or update existing ones by name) and keep resources aligned.
- Push CKAN identifiers/URLs back into Symplectic metadata fields after each successful sync.
- Configurable concurrency, change windows (
changedSince), dry-run support, and structured logging.
Symplectic Elements --> SymplecticClient --> Transform --> CKANClient --> CKAN
^ |
|------------------ postBackToSymplectic -------------------|
| Component | Responsibility |
|---|---|
SymplecticClient |
Authenticated wrapper around the Symplectic Elements API for fetching records and sending metadata updates back. |
transformRecordToDataset |
Normalises Symplectic metadata/files into CKAN dataset/resource payloads (slugged dataset names, extras, tags). |
CKANClient |
Minimal CKAN action API wrapper for package/resource show/create/update operations. |
SyncService |
Orchestrates the flow: fetch records, upsert datasets/resources, feed CKAN identifiers back to Symplectic, and expose summary metrics. |
CLI (src/cli.ts) |
Loads configuration, parses runtime overrides (--since, --dry-run, --concurrency), and executes the synchronisation. |
The service stores CKAN identifiers (dataset id, dataset name, absolute dataset URL, and sync timestamp) back in Symplectic metadata, allowing round-trip status tracking and future delta-based syncs.
-
Install dependencies
npm install
-
Configure environment
Copy
.env.exampleto.envand fill in Symplectic and CKAN credentials; alternatively export the variables in your shell.cp .env.example .env # edit .env -
Run the synchroniser
npm run sync -- --since=2024-01-01T00:00:00Z --concurrency=6
Flags:
--since=<ISO timestamp>overridesSYMPLECTIC_SINCEfor incremental fetches.--dry-runruns transformations and logging without mutating CKAN or Symplectic.--concurrency=<n>controls how many records are processed in parallel (defaultSYNC_CONCURRENCY).
-
Build for deployment (optional)
npm run build
After each dataset is created or updated in CKAN, the service calls SymplecticClient.updateRecord with a payload resembling:
{
"metadata": {
"ckan_dataset_id": "1234-5678",
"ckan_dataset_name": "ecosystem-services-2024-1234",
"ckan_dataset_url": "https://data.example.ac.uk/dataset/ecosystem-services-2024-1234",
"ckan_last_synced": "2024-04-11T09:15:00.000Z"
}
}You can remap these fields server-side (via workflow rules) or adapt postBackToSymplectic for bespoke metadata schemas.
- Adjust
transformRecordToDatasetto map local taxonomy (e.g. dataset groups, licences, custom extras). - Plug the service into a scheduler (GitHub Actions, cron, Airflow) that supplies the required environment variables and optionally narrows
SYMPLECTIC_SINCEfor incremental runs. - Wrap the CLI with container tooling by using
npm run buildand copying thedist/output into a thin Node.js runtime image.
- Mock Symplectic/CKAN endpoints with tools such as WireMock or
nockto validate transformations without touching production systems. - Validate dataset naming collisions by feeding diverse titles; adjust
slugifylogic if stricter conformity is required in CKAN.
This repository contains only the synchroniser logic; it does not ship infrastructure-as-code or deployment manifests so you can integrate it with your preferred delivery pipeline.