CDISC Dataset Generator API

Programmatic Access to Synthetic CDISC-Compliant Datasets

Get Therapeutic Areas

GET /api/therapeutic-areas

Returns all available therapeutic areas that can be used to customize dataset generation.

Example Response:

{
  "therapeutic_areas": {
    "Neurology": {
      "common_conditions": ["Alzheimer's Disease", "Parkinson's Disease", "Multiple Sclerosis"],
      "common_medications": ["Levodopa", "Carbidopa", "Memantine"],
      "common_lab_tests": ["Cerebrospinal Fluid Analysis", "Acetylcholine Receptor Antibody", "Oligoclonal Bands"],
      "condition_count": 10,
      "medication_count": 10,
      "lab_test_count": 5
    },
    "Cardiology": {
      "common_conditions": ["Coronary Artery Disease", "Heart Failure", "Arrhythmias"],
      "common_medications": ["Atorvastatin", "Metoprolol", "Lisinopril"],
      "common_lab_tests": ["Lipid Panel", "Cardiac Enzymes", "B-type Natriuretic Peptide"],
      "condition_count": 8,
      "medication_count": 10,
      "lab_test_count": 6
    },
    ...
  },
  "count": 24
}

API Overview

The CDISC Dataset Generator API provides programmatic access to synthetic clinical trial data in various CDISC formats. You can generate data for SDTM, ADaM, and SEND domains and download them in multiple formats.

Base URL:

Current Version: v1.1

Authentication

Currently, the API is available without authentication for development and testing purposes.

Endpoints

Get API Information

GET /api

Returns general information about the API, available endpoints, and example usage.

Example Response:

{
  "name": "CDISC Dataset Generator API",
  "description": "REST API for generating synthetic CDISC-compliant datasets",
  "version": "1.0.0",
  "endpoints": [
    {
      "path": "/api/domains",
      "method": "GET",
      "description": "Get all available domains for all dataset types"
    },
    ...
  ]
}

Get All Domains

GET /api/domains

Returns all available domains for all dataset types (SDTM, ADaM, and SEND).

Example Response:

{
  "SDTM": {
    "DM": {
      "name": "Demographics",
      "description": "Demographics domain containing subject-level data",
      "class": "Special Purpose"
    },
    ...
  },
  "ADaM": {
    ...
  },
  "SEND": {
    ...
  }
}

Get Domains by Dataset Type

GET /api/domains/{dataset_type}

Returns all available domains for a specific dataset type (SDTM, ADaM, or SEND).

Path Parameters:

  • dataset_type (required): SDTM, ADaM, or SEND (case-insensitive)

Example Request:

GET /api/domains/SDTM

Example Response:

{
  "DM": {
    "name": "Demographics",
    "description": "Demographics domain containing subject-level data",
    "class": "Special Purpose"
  },
  "VS": {
    "name": "Vital Signs",
    "description": "Vital signs measurements",
    "class": "Findings"
  },
  ...
}

Generate Dataset

POST /api/generate

Generates a synthetic CDISC dataset based on the provided parameters.

Request Body:

{
  "dataset_type": "SDTM",  // Required: SDTM, ADaM, or SEND
  "domain": "DM",         // Required: Domain code
  "subjects": 100,        // Optional: Number of subjects (default: 100)
  "arms": 3,              // Optional: Number of treatment arms (default: 3)
  "format": "csv",        // Optional: Output format (csv, sas, xpt) (default: csv)
                          // Note: "json" format temporarily unavailable, will return in a future update with CDISC Dataset-JSON 1.1
  "therapeutic_area": "Neurology", // Optional: Customize for specific therapeutic area
  "download_file": false  // Optional: If true, returns download URL instead of data (default: false)
}

Example Response (with download_file=false and format=json):

{
  "metadata": {
    "dataset_type": "SDTM",
    "domain": "DM",
    "name": "Demographics",
    "description": "Demographics domain containing subject-level data",
    "class": "Special Purpose",
    "subject_count": 100,
    "treatment_arms": 3,
    "record_count": 100,
    "variable_count": 15,
    "variables": [
      {"name": "STUDYID", "type": "object"},
      {"name": "USUBJID", "type": "object"},
      ...
    ]
  },
  "data": [
    {
      "STUDYID": "STUDY001",
      "USUBJID": "STUDY001-001",
      "AGE": 45,
      "SEX": "M",
      ...
    },
    ...
  ]
}

Example Response (with download_file=true):

{
  "message": "Dataset generated successfully",
  "dataset_type": "SDTM",
  "domain": "DM",
  "subjects": 100,
  "arms": 3,
  "format": "csv",
  "file_id": "sdtm_dm_20250405123456.csv",
  "download_url": "/api/download/sdtm_dm_20250405123456.csv",
  "record_count": 100
}

Download Generated File

GET /api/download/{file_id}

Downloads a previously generated file by its ID. Files are automatically deleted after 1 hour. Note: This endpoint is not needed if you use the direct_download parameter with the generate endpoint.

Path Parameters:

  • file_id (required): File ID returned from the generate endpoint

Response:

Returns the requested file with the appropriate MIME type for download.

Get Therapeutic Areas

GET /api/therapeutic-areas

Returns all available therapeutic areas that can be used to customize dataset generation.

Example Response:

{
  "therapeutic_areas": {
    "Neurology": {
      "common_conditions": ["Alzheimer's Disease", "Parkinson's Disease", "Multiple Sclerosis"],
      "common_medications": ["Levodopa", "Carbidopa", "Memantine"],
      "common_lab_tests": ["Cerebrospinal Fluid Analysis", "Acetylcholine Receptor Antibody", "Oligoclonal Bands"],
      "condition_count": 10,
      "medication_count": 10,
      "lab_test_count": 5
    },
    "Cardiology": {
      "common_conditions": ["Coronary Artery Disease", "Heart Failure", "Arrhythmias"],
      "common_medications": ["Atorvastatin", "Metoprolol", "Lisinopril"],
      "common_lab_tests": ["Lipid Panel", "Cardiac Enzymes", "B-type Natriuretic Peptide"],
      "condition_count": 8,
      "medication_count": 10,
      "lab_test_count": 6
    },
    ...
  },
  "count": 24
}

Code Examples

JavaScript / Fetch API

// Generate a DM domain dataset
fetch('/api/generate', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    dataset_type: 'SDTM',
    domain: 'DM',
    subjects: 50,
    arms: 2,
    format: 'csv',  // Note: 'json' format temporarily unavailable
    therapeutic_area: "Oncology",
  })
})
.then(response => response.json())
.then(data => console.log(data))
.catch(error => console.error('Error:', error));

Python / Requests

import requests
import json

# Generate a DM domain dataset
response = requests.post(
    'https://your-app-url/api/generate',
    headers={'Content-Type': 'application/json'},
    data=json.dumps({
        'dataset_type': 'SDTM',
        'domain': 'DM',
        'subjects': 50,
        'arms': 2,
        'format': 'csv',
        "therapeutic_area": "Neurology",
        'download_file': True
    })
)

# Get the download URL
result = response.json()
download_url = result['download_url']

# Download the file
file_response = requests.get('https://your-app-url' + download_url)
with open('dm_data.csv', 'wb') as f:
    f.write(file_response.content)
# Alternative: Generate and download a CSV file directly in a single step # This will download the file without needing a separate download step response <- POST( url = "https://your-app-url/api/generate", body = list( dataset_type = "SDTM", domain = "DM", subjects = 50, arms = 2, format = "csv", therapeutic_area = "Cardiology", direct_download = TRUE ), encode = "json", write_disk("dm_data_direct.csv") )

Error Responses

The API uses standard HTTP status codes to indicate the status of a request:

  • 200 OK: The request was successful.
  • 400 Bad Request: The request was invalid or missing required parameters.
  • 404 Not Found: The requested resource was not found.
  • 500 Internal Server Error: An error occurred on the server.

Error responses include a JSON object with an "error" key containing a detailed error message.

Example Error Response:

{
  "error": "Missing required parameter: domain"
}