COVID-19 Data Definition: Standards for Clinical Research


A massive research effort has been mobilized to address the COVID-19 pandemic. Timing is a critical to this investigative work given the transmission speed and detrimental impact of the disease. Coordinating research efforts and building upon previous work are key factors for shrinking the therapeutic discovery window. To assist in that process, this article presents the data definition of a international COVID-19 registry (see here). Initially starting with sites in Italy and China, the registry has been adopted by the American Society of Anesthesia Research committee on Critical Care Medicine and deployed in multiple medical centers across the United States.

It is hoped others will leverage this information to minimize redundant work and facilitate the aggregation of data for analysis. Those with an interest in collaborating please send your contact information to A separate article on techniques used used to maximize registry data collection can be found in a separate blog post.

Scope Constraints

As with all registries, the data definition is both over-inclusive and under-inclusive depending on the aims. Thus, a brief description of the international registry is in order to help appropriately frame the data definition and it’s potential utility. The registry had both administrative aims related to case reporting and care planning, as well as empirical aims directed toward preliminary hypothesis testing of rescue strategies and guidance for subsequent clinical trials. The clinical response to nitric oxide gas (iNO) was of particular interest. Also note the registry began in multiple Italian medical centers during a period of intensive time demands on medical staff, thus data entry burden was a key concern.

Taken together, these aims form the basis of the constraints around the scope of the data definition. Although the data collection domains are generally applicable to nearly all COVID-19 clinical studies, modifications are expected as needed for specific needs of each trial. To be clear, it is not being asserted here that the data definition presented contains sufficient breadth for all COVID-19 related clinical research projects.

Registry Design

The registry protocol included daily data collection for up to 28 days.  The domains assessed included the following:

Data Definition Spreadsheet

A spreadsheet containing detailed information about the data definition can be downloaded below. Click below to download spreadsheet

Download XLSX • 48KB

The spreadsheet is structured as one row per variable with the columns defined as:

  • Form name

  • Description of form

  • Variable Name

  • Type

  • Numeric

  • Checkbox (0 = not checked, 1 = checked)

  • Picklist or radio button (coded value with associated text [e.g., sex: 0 = female, 1 = Male])

  • Numeric value

  • Numeric calculation

  • Date

  • Text / memo

  • Document

  • Code (i.e., unique variable name, less than 30 characters)

  • Data entry prompt

  • Variable description

  • As applicable

  • Minimum value (numeric fields only)

  • Maximum value (numeric fields only)

  • Default value

  • Length (text fields only)

  • Required for data entry

  • Numeric coded values

  • Separated by commas into two components

  • The numeric value

  • Label name

  • Missing values


Below are screenshots of all the forms. Note validation was turned off for the purpose of these screenshots so that all the fields would appear. The ‘live’ forms contain skip logic to hide/show fields as appropriate based on data entry.

© Sciencetrax, LLC

  • LinkedIn