Submission Guide
Overview
We welcome high-quality benchmark dataset submissions from the pharmacometrics community. Each submission undergoes rigorous peer review and, upon acceptance, becomes a citable publication with a DOI.
Benchmark Requirements
All benchmark datasets must meet the following criteria:
1. Realism
- Irregular sampling: Reflect realistic clinical trial sampling schedules
- Confounding dropouts: Include realistic patient dropout patterns
- Realistic relationships: Capture true dose-exposure-response relationships
2. Longitudinal Structure
- Data must include repeated measurements over time
- Appropriate for pharmacometric modeling approaches
3. Comprehensive Documentation
Your submission must include:
- Generative process description: For synthetic data, document how the data was generated
- Realistic scenario description: Explain what real-world situation the dataset represents
- Clear data dictionary: Describe all variables and their units
4. Associated Tasks
Define specific tasks that reflect real-world decision-making in drug development:
- Model selection challenges
- Dose optimization
- Clinical trial simulation validation
- Exposure-response characterization
5. Train/Test Split
- Provide a pre-specified train/test split
- Evaluation metrics should be computed on the test set
- Document the rationale for the split strategy
Submission Structure
Each benchmark must follow this directory structure:
benchmarks/<dataset-name>/
├── index.qmd # Main description and documentation
├── data/
│ ├── train.csv # Training dataset
│ ├── test.csv # Test dataset
│ └── data-dictionary.yml # Column descriptions (yspec-style YAML)
├── metadata.yml # Machine-readable metadata
└── README.md # Quick reference (generated from index.qmd)
Required Files
1. index.qmd
Your main documentation file must include the following sections:
- Title and Authors
- Abstract: Brief overview of the benchmark
- Background: Context and motivation
- Data Generation (for synthetic data): Detailed methodology
- Dataset Description: Variables, sample size, study design
- Tasks: Specific modeling challenges with evaluation criteria
- Train/Test Split: Description and rationale
- References: Relevant citations
2. Data Files
train.csv: Training datasettest.csv: Test dataset- Both files must use the same column structure
3. data-dictionary.yml
A yspec-style YAML schema documenting each column. Top-level keys are column names (one key per column in train.csv/test.csv); reserved keys ending in __ (e.g. SETUP__) are not treated as columns.
SETUP__:
description: Brief description of this data dictionary
glue:
- "{{ short }}"
ID:
short: Subject identifier
type: integer
values: 1 to N
TIME:
short: Time since first dose
type: numeric
unit: hours
range: [0, ]
DV:
short: Dependent variable (plasma concentration)
type: numeric
unit: mg/L
comment: Observation rows only; dose-event rows have DV missing.
DROPOUT:
short: Dropout indicator
type: integer
values:
0: completed
1: dropped outLegacy data-dictionary.csv files (with columns column_name, description, units, type, coding) are still accepted by the validator for backwards compatibility with earlier submissions, but new submissions should use the YAML form.
4. metadata.yml
Machine-readable metadata in YAML format:
name: dataset-name
title: Full Dataset Title
version: 1.0.0
date: 2025-10-16
authors:
- name: Jane Doe
affiliation: University Example
email: jane.doe@example.com
- name: John Smith
affiliation: Pharma Corp
description: Brief description of the benchmark
keywords:
- pharmacokinetics
- dose-response
- longitudinal
data_type: synthetic
therapeutic_area: oncology
n_subjects: 250
n_observations: 2500
tasks:
- name: prediction-accuracy
type: regression
description: Predict plasma concentrations in the test set
target: DV
output_format: {type: individual_predictions, columns: [ID, TIME, PRED]}
metric: rmse
license: CC-BY-4.0Step-by-Step Guide
This guide walks through a complete submission from scratch. If you are comfortable with GitHub and git, skip to Submission Process below.
Prerequisites
Install the following before starting:
- A GitHub account — Sign up at github.com
- Git — git-scm.com/downloads
- macOS:
brew install git - Windows: use the Git for Windows installer
- Linux:
sudo apt install gitorsudo yum install git
- macOS:
- Git LFS — git-lfs.github.com
- macOS:
brew install git-lfs - Windows: included in Git for Windows
- Linux:
sudo apt install git-lfs
- macOS:
- Quarto — quarto.org/docs/get-started
1. Fork the Repository
- Go to github.com/PMxBenchmarks/pmx_benchmarks
- Click Fork (top-right corner)
- Select your GitHub account as the destination
You now have your own copy at github.com/<your-username>/pmx_benchmarks.
2. Clone Your Fork
git clone https://github.com/<your-username>/pmx_benchmarks.git
cd pmx_benchmarks3. Set Up Git LFS
Run once after cloning:
git lfs installVerify tracking is active:
git lfs trackYou should see *.csv, *.parquet, *.RData, and other data formats listed.
4. Create a Branch
git checkout -b benchmark/<your-dataset-name>Example: git checkout -b benchmark/idr-pkpd-covariate
5. Add Your Benchmark
Create the directory structure:
mkdir -p benchmarks/<your-dataset-name>/dataUse BENCHMARK_TEMPLATE.md and benchmarks/example-pk-model-selection/ as references. Required files:
benchmarks/<your-dataset-name>/
├── index.qmd
├── metadata.yml
└── data/
├── train.csv
├── test.csv
└── data-dictionary.yml
6. Validate Locally
# Check Quarto renders without errors
quarto render benchmarks/<your-dataset-name>/index.qmd
# Run validation scripts
python .github/scripts/validate_benchmark.py
python .github/scripts/validate_data.pyFix any errors before continuing.
7. Commit and Push
git add benchmarks/<your-dataset-name>/
git commit -m "Add <your-dataset-name> benchmark"
git push origin benchmark/<your-dataset-name>8. Open a Pull Request
- Go to your fork:
github.com/<your-username>/pmx_benchmarks - Click Compare & pull request (appears after pushing)
- Set base repository to
PMxBenchmarks/pmx_benchmarks, base branchmain - Fill in the PR template completely
- Click Create pull request
9. Respond to Review
Reviewers will comment on your PR. Push updated commits to the same branch — do not open a new PR. The existing PR updates automatically.
Large Data Files
All benchmark data files are stored using git-lfs. Run git lfs install once after cloning and git will automatically handle tracked file types (*.csv, *.parquet, *.RData, *.rds, *.sas7bdat, *.xpt).
Task Schema
Each task in metadata.yml should declare a type, output_format, and metric. The validator warns when type is absent and enforces output_format and metric only when type is provided. Three task types are supported:
Regression
Predict a continuous outcome for each row in the test set.
- name: prediction-accuracy
type: regression
target: DV # column in test.csv with ground truth values
output_format: {type: individual_predictions, columns: [ID, TIME, PRED]}
metric: rmse # rmse | mae | nrmseClassification
Predict a binary or multi-class outcome.
# Soft predictions
- name: dropout-prediction
type: classification
output_format: {type: probabilities, columns: [ID, P_DROPOUT]}
metric: auroc
# Hard predictions
- name: model-selection
type: classification
output_format: {type: class_predictions, columns: [MODEL_SELECTED]}
metric: accuracyCounterfactual
Report aggregate statistics under a hypothetical intervention. Output must be aggregates — not individual-level predictions.
- name: cmax-exceedance-2x-dose
type: counterfactual
scenario: "2x observed dose, same schedule"
output_format: {type: summary_stats, stats: [q10, q25, q50, q75, q90]}
truth_file: tasks/cmax_truth.yml
metric: quantile_coverageThe truth_file must be deposited in tasks/ and contain pre-computed ground truth from your generative model. Its estimates keys must exactly match the names declared in output_format.stats:
# tasks/cmax_truth.yml
scenario: "2x observed dose, same schedule"
n_sim: 10000
population: all
estimates:
q10: 45.2
q25: 58.7
q50: 74.3
q75: 95.1
q90: 118.4The other supported counterfactual output type is probability (with threshold and direction: above|below; truth file needs a single p key).
One metric per task — create separate tasks if you want multiple metrics evaluated.
Submission Process
Step 1: Prepare Your Benchmark
- Create your benchmark following the structure above
- Test that your documentation builds correctly with Quarto
- Validate that your data files are properly formatted
Step 2: Submit a Pull Request
- Fork this repository
- Create a new branch:
git checkout -b benchmark/<your-dataset-name> - Add your benchmark to
benchmarks/<your-dataset-name>/ - Commit your changes with a clear message
- Push to your fork and submit a Pull Request
Use our Pull Request template which will guide you through the submission checklist.
Step 3: Peer Review
Your submission will undergo peer review:
- Technical validation (automated checks)
- Scientific review (expert evaluation)
- Documentation quality assessment
Reviewers will provide feedback via PR comments. Please address all comments before final acceptance.
Step 4: Acceptance and Publication
Upon acceptance:
- Your benchmark will be merged into the main repository
- A DOI will be assigned
- Your benchmark will appear on the website
- You can cite it in publications
Tips for a Successful Submission
- Start early: The review process may take several iterations
- Be thorough: Complete documentation speeds up review
- Test your data: Ensure files load correctly and contain expected information
- Engage with reviewers: Respond promptly to feedback
- Follow examples: Look at existing benchmarks for guidance
Questions?
If you have questions about the submission process, please contact us or open a discussion on GitHub.