What is DocumentSpark?

DocumentSpark is a study-build automation platform for clinical trials. It reads an approved protocol once and generates every downstream document — case report forms, data management plan, edit checks, SDTM mapping, consent forms, and the SDLC validation suite — automatically, with each artifact traceable to the protocol clause it derives from.

Who is DocumentSpark for?

DocumentSpark is built for clinical trial sponsors, CROs, clinical data managers, biostatisticians, principal investigators, quality assurance, and regulatory affairs professionals.

Which standards does DocumentSpark follow?

The platform aligns with CDISC CDASH v2.2, CDISC SDTMIG v3.3, CDISC Define-XML 2.1, CDISC ODM 1.3, ICH E6 Good Clinical Practice, SCDM Good Clinical Data Management Practices (GCDMP), ISPE GAMP 5, the FDA Study Data Technical Conformance Guide, and the DIA Reference Model for TMF.

Is DocumentSpark an EDC or CDMS?

No. DocumentSpark sits upstream of your EDC or CDMS and produces the design and specification artifacts those systems consume. Operational data capture, query workflow, and database lock execution happen inside your EDC.

Protocol → Deliverables · Single Source of Truth

From protocol to study‑ready deliverables in days, not months.

DocumentSpark extracts the structured facts inside your protocol once, then generates every downstream data management, regulatory, and SDLC validation artifact your trial needs — traceable to the clause it derives from, and re-reconciled when the protocol changes.

Request a demo → See how it works

28+

Deliverable types

Workflow phases

Source of truth

Protocol · v2.3 ● 80 FACTS EXTRACTED

PhaseII

Arms04

Visits12

Forms43

Facts80

Generated artifacts

CDASH v2.2 · SDTMIG 3.3 SYNC 14:02:11Z

Aligned with

CDISC CDASH v2.2 CDISC SDTMIG v3.3 ICH E6 GCP SCDM GCDMP GAMP 5 FDA SDTCG DIA TMF RM

01 / Problem Why study setup takes months

What happens today

A team of people, retyping the same protocol by hand.

When a protocol is approved, a dozen specialists — across the sponsor, the CRO, data management, and regulatory affairs — each open the same document and start writing their own set of deliverables from scratch. They are all drawing from the same source, but they work independently and in their own formats. The documents disagree with each other before the trial even begins, and every protocol amendment forces the whole team to start reconciling all over again.

A. Manual labor

The same document, authored a dozen times over.

Every team that touches the trial reads the protocol and writes their own set of documents from scratch. The same fact ends up retyped a dozen ways, in a dozen different formats — without any of the authors reading each other's work.

B. Inconsistency

The documents disagree from day one.

Because every author works independently, almost no two documents match. Catching the mismatches is a manual job — someone has to compare the documents line by line. The errors that slip through are discovered weeks or months later, often by an auditor.

C. Human error

One missed change cascades for the life of the trial.

When the protocol is amended, someone has to find every document the change touches, open it, edit it by hand, and re-route it for review. Miss a document and the inconsistency follows the trial into data collection — where it becomes expensive to unwind.

D. Time

Four to nine months before the first patient.

Writing, reviewing, and reconciling the document set sits squarely on the critical path between an approved protocol and enrolling the first patient. Every week added there is a week the trial isn't running.

4–9mo

Study setup time

Between an approved protocol and the first patient enrolled in a mid-sized trial.

28+

Documents per study

Stand-alone documents a study team has to write and keep in sync — entirely by hand.

37%

Cost of rework

Industry estimates of trial cost driven by mid-study changes and the manual reconciliation they trigger.

02 / Process A linear, traceable build

How it works

One protocol in. A complete study package out.

Five stages move a protocol from PDF to a deployment-ready package of CRFs, plans, specifications, and regulatory documentation — with every artifact linked to the protocol clause it derives from and re-checked whenever the protocol changes.

Stage 01

Ingest

Upload the protocol PDF. Manual CRF packets (DOCX) and REDCap data dictionaries (CSV) can be imported alongside.

Inputs · PDF · DOCX · CSV

Stage 02

Extract

A two-pass structured read turns the protocol into a typed fact base — the study arms, visit schedule, eligible population, treatments, and assessments. Every fact is reviewable against the clause it came from.

Verified · 2-pass extraction

Stage 03

Generate

Every downstream artifact is generated from the same fact base — CRFs, DMP, edit checks, SDTM mapping, consent forms, SDLC validation suite, TMF skeleton.

28+ artifacts · One source

Stage 04

Validate

Bidirectional traceability between protocol and artifacts. Quality audit runs deterministic plus structural checks; auto-repair resolves the issues that don't require human judgment.

Bidirectional · Auto-repair

Stage 05

Export

Deliver each artifact in the format its consumer expects — DOCX for sponsors, CDISC ODM XML for EDCs, Define-XML and a full ZIP for submission, CSV/XLSX for downstream tooling.

DOCX · ODM · Define · ZIP

03 / Preview Inside the platform

Single source of truth

Every artifact is bound to the protocol clause that produced it.

The Protocol Intelligence workspace shows the structured model extracted from your protocol — arms, population, schedule, interventions — reviewable side-by-side with the source clause. Downstream generators consume the same fact base, so every deliverable inherits the same canonical numbers.

Projects › ILC Staging › Protocol Intelligence

Verification Notes (2-pass extraction)

80 TOTAL ITEMS EXTRACTED · 12 CLAUSE LINKS · 0 CONFLICTS

SStudy Design

TypeInterventional

PhaseII / Feasibility

BlindingDouble-blind

RandomizationNon-randomized

DesignSingle arm · Within-subject

PPopulation

Target size20 subjects

Age range≥ 18 years

ConditionInvasive lobular breast cancer (ILC)

Inclusion4 criteria

Exclusion1 criterion

VVisit Schedule

DurationUp to 5 years

Total visits5 visits

Enrollment / Screening
−60 days, prior to first PET scan

Study Visit 1 · PET
Day 1, within 30 days of enrollment

Study Visit 2 · PET 2
Day 2, within 15 days of Visit 1

Workspace · Protocol Intelligence Live link to downstream artifacts

04 / Deliverables A complete document set, from one protocol

Full coverage

Twenty-eight artifacts. One fact base.

DocumentSpark generates the full set of artifacts required to operationalize a trial — from case report forms to the SDLC validation package to the submission-ready CDISC bundle. Each artifact is version-controlled, reviewable, and bound to the protocol clause it derives from.

Tier I

Study setup

2 artifacts

D-01

Study Design Configuration

SDC · arms · visits · randomization

D-02

Informed Consent Form

ICF · 6 sections, per-section regen

Tier II

Case Report Forms

4 artifacts

D-03

Case Report Forms

CRF · canonical form codes

D-04

Annotated CRF

aCRF · SDTM annotations

D-05

CRF Instruction Guide

per-form narrative guidance

D-06

Visit–Form Matrix

visit × form assignments

Tier III

Data Management SCDM GCDMP

8 artifacts

D-07

Data Management Plan

DMP · 18 GCDMP sections

D-08

Data Dictionary

DD · variable catalog, CT

D-09

Edit Check Specification

ECS · range · logical · derived

D-10

Data Review & Cleaning Plan

DRCP

D-11

SAE Reconciliation Procedure

SAERP

D-12

Database Lock Checklist

DBLC

D-13

External Data Transfer Spec

EDTS · per-vendor profiles

D-14

Medical Coding Specification

MCS · MedDRA · WHODrug · LOINC

Tier IV

SDTM Submission CDISC · FDA SDTCG

5 artifacts

D-15

SDTM Mapping Package

CDASH → SDTM · lookup-first

D-16

Define.xml

v2.1 (or v2.0 pre-2023)

D-17

SDTM Reviewer's Guide

SDRG

D-18

SUPPQUAL Templates

SUPP-- supplemental qualifiers

D-19

Submission ZIP Package

full bundle · validator-ready

Tier V

SDLC Validation GAMP 5

11 artifacts

D-20

User Requirements Spec

URS

D-21

Functional Requirements Spec

FRS

D-22

System Design Spec

SDS

D-23

Application Design Spec

ADS · target EDC

D-24

Audit Trail Design Spec

ATDS

D-25

Security & Access Controls

SACS

D-26

Internal Interface Spec

IIS

D-27

DB Physical Model

tables + attributes

D-28

Test Plan / List / Procedure

SDLC test suite

D-29

UAT Protocol

UAT · test cases + criteria

D-30

UAT Report

aggregated outcomes

Tier VI

Trial Master File DIA RM · ICH E6

2 scaffolds

D-31

TMF Zone Structure

DIA RM · 10 zones

D-32

Essential Documents Checklist

ICH E6 · before · during · after

Interoperable, EDC-agnostic

Deliver to any EDC, CDMS, or eTMF.

DocumentSpark sits upstream of your data-capture stack and exports each artifact in the format its consumer expects. Nothing locks you to a specific EDC vendor — the platform produces standards-bound, validator-ready files.

.xml CDISC ODM 1.3 CRFs · Medidata mdsol extensions

.xml Define-XML 2.1 v2.0 fallback pre-2023

.docx Microsoft Word DMP · URS · FRS · SDS · MCS · UAT · 14 more

.csv · .xlsx Tabular Data dictionary · edit checks

.pdf Annotated CRF aCRF · regulator-ready

.zip Submission Bundle SDTM · Define · SDRG · SUPPQUAL

05 / Standards Built to the frameworks you're audited against

Standards & compliance

Pinned versions. Date-resolved against the FDA catalog.

Each artifact conforms to the structural, terminological, and process standards that govern modern clinical trials. Submission targets (SDTM, Define-XML, controlled terminology) are date-resolved against the FDA Data Standards Catalog based on your study start date.

CDISC · CDASH

Clinical Data Acquisition Standards Harmonization

Pinned · IG v2.2 · 7-domain baseline

CRF fields and data collection structures conform to the CDASH model out of the box, grounded against the CDASH IG v2.2 baseline.

CDISC · SDTM

Study Data Tabulation Model

SDTM 1.7 · SDTMIG 3.3 · CT 2025-09

SDTM mapping is lookup-first against a 315-entry curated catalog; variables without a lookup entry are explicitly annotated NOT SUBMITTED rather than guessed.

CDISC · Define / ODM

Define-XML 2.1 · ODM 1.3

v2.0 / pre-2023 fallback via FDA catalog

Define.xml and ODM exports follow CDISC structure; ODM uses the Medidata mdsol vendor extensions where applicable.

ICH · E6 GCP

Good Clinical Practice

R2 / R3 (2025)

Generated artifacts preserve the traceability, version control, and audit posture required by ICH E6.

SCDM · GCDMP

Good Clinical Data Management Practices

Per-chapter (2023 / 2025)

DMP, edit checks, DRCP, and SAE reconciliation follow GCDMP guidance — the 18-section DMP carries per-section excerpts as the generation contract.

ISPE · GAMP 5

Good Automated Manufacturing Practice

Framework for SDLC suite

URS, FRS, SDS, ADS, ATDS, SACS, IIS and the full test suite are produced under a GAMP 5 framework with bidirectional traceability.

FDA · SDTCG

Study Data Technical Conformance Guide

May 2023

Submission-bound artifacts align with FDA's technical conformance requirements for study data. Versions are resolved from the FDA Data Standards Catalog by study start date.

FDA · CSA & CSV

Computer Software Assurance

Cited in ADS · ATDS · SDTM

Validation deliverables incorporate FDA computer-systems guidance (2007 + Oct 2024 Q&A) and CSA-aligned risk-based test rationale.

DIA · TMF RM

DIA Reference Model

10-zone structure

The TMF scaffold is seeded against the DIA Reference Model, with the ICH E6 essential-document checklist organized into before / during / after trial categories.

Governance · 01

Bidirectional traceability

Every artifact carries a back-reference to the protocol clauses it depends on; validation runs both ways — protocol → artifact and artifact → protocol.

Governance · 02

Version-controlled artifacts

Integer-versioned artifacts with immutable per-version history. Lifecycle moves draft → in review → pending approval → approved, with superseded on amendment.

Governance · 03

Recorded approval

Approval requires password re-confirmation; each approval persists a signature timestamp and signature meaning against the artifact version being signed.

Governance · 04

Immutable change history

Every state change writes to an immutable change log — user, action, resource, before/after — exportable for sponsor governance and oversight.

06 / Audience Who DocumentSpark is for

Built for the full trial team

One platform, every role in the study build.

From sponsor to CRO to regulatory affairs, DocumentSpark replaces fragmented authoring tools with a single environment that produces every deliverable each function depends on — without locking the trial to a single EDC or vendor.

CROs

Standardize study build across clients and EDCs. Move faster on competitive bids by reducing the cost of upfront authoring work; compress UAT by validating against the source protocol.

Operations · Study delivery

Clinical Data Managers

Author CRFs, edit checks, and the DMP from a structured protocol model instead of free text. Catch amendment impact before it reaches UAT through traceability and the staleness flag.

Data management · Biostatistics

Regulatory, QA & PIs

Inherit traceability and version control by default. Submit study data aligned to the FDA conformance guide without rework; investigators see consent and assessments derived from the protocol they approved.

Regulatory · Quality assurance · PIs

07 / Engage Request a demonstration

Request a demo

See your protocol generate a study package.

Bring a redacted protocol or a representative document from a recent trial. We will walk you through the structured extraction, the generated deliverables, and the traceability model — live, against your own content.

Typical session

45 minutes · remote

For best fit

Sponsor, CRO, or DM lead

Materials

Redacted protocol (optional)

Follow-up

Sample deliverable package

Full name*

Organization*

Work email*

Role

What you'd like to see

We respond within one business day. Materials shared with us are treated as confidential and are not retained without explicit authorization.

From protocol to study‑ready deliverables in days, not months.

A team of people, retyping the same protocol by hand.

The same document, authored a dozen times over.

The documents disagree from day one.

One missed change cascades for the life of the trial.

Four to nine months before the first patient.

One protocol in. A complete study package out.

Ingest

Extract

Generate

Validate

Export

Every artifact is bound to the protocol clause that produced it.

Protocol Analysis Complete

Verification Notes (2-pass extraction)

Twenty-eight artifacts. One fact base.

Study setup

Case Report Forms

Data Management SCDM GCDMP

SDTM Submission CDISC · FDA SDTCG

SDLC Validation GAMP 5

Trial Master File DIA RM · ICH E6

Deliver to any EDC, CDMS, or eTMF.

Pinned versions. Date-resolved against the FDA catalog.

Bidirectional traceability

Version-controlled artifacts

Recorded approval

Immutable change history

One platform, every role in the study build.

Sponsors

CROs

Clinical Data Managers

Regulatory, QA & PIs

See your protocol generate a study package.