top of page

The Great Disease: A Patient Registry Bus Ride Out of Hades


In C.S. Lewis' literary classic, "The Great Divorce," the narrator takes a bus out of Hades and steps into a world of spirits ensnared in a cycle of behavior that guarantees their return trip. Similarly, design of multi-organ disease registries involves stepping clear of ensnaring issues that doom the likelihood of success. This blog examines several of these issues and how successfully traversed using a registry of patients with hereditary hemorrhagic telangiectasia (HHT). HHT is an autosomal dominant genetic disorder characterized by vascular malformations (arteriovenous malformations; AVMs) in various parts of the body (e.g., lungs, brain, liver, GI tract, etc.) that can lead to significant complications, including bleeding, anemia, ischemic stroke, cerebral and pulmonary hemorrhage, pulmonary hypertension, and high-output cardiac failure. The author thanks Cure HHT ( for the opportunity to be involved in their work and their help with this blog.


Not Small, But How Big?

In The Great Divorce, the main character (the narrator) learns the bus ride out of Hades involves expanding into a much larger world. Similarly, a patient registry with a data definition that is too narrow has hellishly little utility, the solution being to expand the explanatory factors collected to sufficiently address the aims. Therein lies the main challenge, what’s big enough? Said differently, there is a constant tension between an ever-expanding data definition and the associated data capture burden.

To better appreciate the challenges of this tension, consider the list below of the main organs, events, and related assessment domains (e.g., symptoms) that can be associated with HHT:

  • AVMs in various organs (note the range of potential organ involvement)

    • Brain, Spine, Liver, Kidneys, Lungs, Pancreas, Spleen, Uterus, Bladder, Limbs, Other…

      • AVM number, size, location…

    • Function and disease of involved organs

  • Treatment

    • Medication use

      • Name, Dose, frequency, etc.

      • Side-effects

      • Response

      • Patient reported outcomes

    • Surgery

      • Type, intra/post operative characteristics

    • Embolization

    • etc.

  • Other medical history

  • Genetic variants

  • Women’s Pregnancy and Reproductive Health

  • Bleeding (e.g., nose, GI tract)

  • Imaging

    • Type: MRI, Angiogram, CT, etc.

    • Setup characteristics

    • Findings

  • Bloodwork

    • Type: CBC, Iron studies, biomarkers, etc.

    • Results

  • Hospitalizations / Severe adverse events

Further complicating this tension, note that cross-specialty registries rarely have a single individual with sufficient expertise across relevant domains to determine what data to include/exclude for a given set of aims. Instead, these registries leverage multiple, specialty specific experts. Even when these experts agree to a common set of aims, pet interests seem to find a way to creep into the data definition. Moreover, there is a natural tendency to over-emphasize the importance of one’s knowledge area (thank goodness this author is immune to such mechanisms 😊), putting additional pressure on data definition scope creep. The combination of these factors can easily drive the data definition scope to expand and outpace what’s feasible for data collection.

Light Years Away

In The Great Divorce, the narrator notes people in hell prefer to live vast distances away from one another, failing to recognize the relational nature of personal growth. Patient registries that cross multiple medical specialties have a “distance” issue to address. That is, single specialty registries can leverage the clinic as the central control mechanism for managing data capture, such as a Movement Disorder Clinic for data associated with an Essential Tremor registry. In contrast, multi-specialty registries require coordinating data capture across two or more clinics, generally with each operating largely independently. As a result, coordinating strategies are needed across clinics to ensure adherence to data collection requirements (i.e., what is collected and when [e.g., annually]).

Common strategies include the recruitment of sites that:

  • consolidate multiple specialties under a single roof [e.g., Centers of Excellence]

  • implement a centralized electronic medical record system

  • have a high patient contact rate (e.g., hematology for an HHT registry).

Even with the robust strategies in place, however, mitigating the risk of data collection failures is an ongoing challenge.

The Cure HHT registry implemented an innovative strategy that engages the assistance of patients in coordinating and validating data collection across clinics. In short, the main coordinating clinic conducts a semi-structured patient interview covering all medical care since the last clinic visit. For instance, “Did you have a brain MRI since the last visit?” Responses are recorded and compared to data entered in the registry. A report is generated that separates comparisons into one of three categories:


Discordant: Reported by patient, but not in database

Discordant: NOT reported by patient, but in database

The comparison report verifies existing data entry and guides acquisition of missing data. The “Brain” section of the report is shown above prior to making appropriate corrections. Data capture for a given visit is considered complete when all items are concordant.

A Clear Lens

In The Great Divorce, the narrator travels out of hell and notices an improved ability to see the choices of people, “…you see the choices a bit more clearly than on Earth: the lens is clearer.” Patient registries are “lenses” through which disease history and treatment outcomes are revealed. Best setup practices cover a wide range of topics, two examples using the HHT registry are discussed here, other topics will be explored in forthcoming articles.

Facilitating Data Analysis

Regrettably, the analysis of registry data is often overlooked during the setup phase. This comment isn’t directed toward the development of the data definition and the planned analyses of specific aims, but instead the often-limited consideration and adjustment of the setup design to better streamline the practical aspects of data analysis. One of the main contributing factors is the traditional division of responsibilities. That is, set-up staff are tasked with configuring the data collection system and creating forms. Statisticians assist with the data definition but are not involved in setup and instead sort out the data structure post-hoc when analyzing data. This disconnect misses an opportunity to optimize the analytical process for the delivery of timely results.

To better appreciate the failed setup decisions that characterize these missed opportunities, consider the following example. The image below shows four Case Report Forms [CRFs] used for data collection, each containing a variable with the same definition (i.e., the yellow box).The standard setup approach is to create four independent CRFs resulting in four separate instances of the same variable.For instance, the common variable “Bleeding - (Yes/No)” may be found on the CRFs “Cerebral Angiogram”, “Surgical Resection”, “Embolization”, and “Lung Transplant”. To generate bleeding base rates across these events, the statistician must examine four separate variables instead of just one. These types of missed opportunities compound and serve to significantly delay the analytical phase. Moreover, registries rarely have a devoted statistician, so these delays are experienced anew with each analysis.

Repeating Events

The Great Divorce shows the walls of hell are constructed from a compulsion to repeat past mistakes. Data capture of repeating events in patient registries is also a difficult challenge, perpetuating common mistakes can wall-off attempts for a smooth analytical process. A previous article examined setup options for repeating events in detail (see here), so only high-level principles are covered here which can be separated into three general approaches.

(1) Adding Columns

The first approach to repeating events is to estimate the maximum number of repetitions and then build the corresponding number of CRFs. This is like adding columns to a spreadsheet whereby every instance of a CRF/variable combination is a new column. The number of columns equals the maximum number of repetitions per variable, typically incorporating an incremental number into the variable naming convention (see image below). This setup is rarely appropriate and is typically an error as it needlessly increases the number of variables which complicates data analysis.

(2) Adding Rows

Another approach is to allow a single CRF to repeat as needed per event. Each instance is analogous to adding a new row to a spreadsheet for a given patient (see image below). This approach limits the number of variables defined making analysis much less daunting.

Although adding rows is generally the most appropriate setup, simply allowing data entry staff to enter multiple instances of a CRF is problematic without extensive supporting features and validation mechanisms in place to ensure accurate data entry. A few of the issues requiring attention include:

  • Preventing duplicates

  • Managing the relationship between events and across time

  • Events independent

    • A new CRF is created with each event

  • Events interdependent

    • What variable values change vs. remain constant?

    • What range are event characteristics applicable (e.g., begin/end)?

    • What indicators signal create a new vs. update an existing event?

As an example, consider brain AVM data capture. Note first despite only one CRF, there are two different types of data (see image below). Some data, such as diagnosis date, are fixed and do not change over time. Other data, such as size, may vary across clinic visits. The data management system should leverage this distinction and reduce the data entry workload by pre-populating fixed values and allow updates for variable data. Moreover, when a patient has more than one AVM, the system should enable staff to easily select the correct AVM for data entry and avoid creating duplicate records (e.g., based on location and onset date). Also, the AVM display should adjust based on diagnosis and clinic visit date, and allow users to easily identify the appropriate AVM for data entry. These are but an example few of the features and validation controls needed for accurate data entry of repeating events.

Transforming Break

In The Great Divorce, breaking the cycle of repeating mistakes transforms the narrow and confined into unbounded potential, such as a tiny lizard into a magnificent horse. A common mistake that narrows the utility of patient registries is viewing it merely as an exercise in data storage, with periodic data dumps for analysis. Breaking this mindset transforms a registry into a dynamic resource providing real-time information to healthcare staff in their daily operations and delivering personalized educational materials to patients. One example using the HHT registry is covered here, but results from this change in mindset are covered in several other blog articles (e.g., see here).

AVMs are a central concern in HHT and can occur in many different organs. Moreover, various imaging techniques are used to diagnosis and monitor AVMs. Given these two characteristics, a patient-level report was created with a section using icons to visualize all possible AVM organ sites (see image below). Conditional red and green icons indicated positive vs. negative presence of AVMs, respectively. Immediately after the icons, a table of only organs positive for AVMs is shown containing descriptive information and the latest imaging results. In this way, the most recent information, and often most pertinent to the immediate workflow, is uncluttered from historical details available elsewhere in the report. Taken together, staff can quickly and accurately get up to speed on AVMs and current findings during case conference for a given patient.

Building A Dream

The “Great Divorce” ends abruptly with the narrator awakening to realize the trip out of Hades was a dream, inviting the reader to wonder how best to apply the dream’s insights to day-to-day living. So too the data definition of a registry is a “Dream-State” in need of applied techniques to ensure the vision is optimally realized. Various techniques have been previously discussed (e.g., engaging patients, clinically integrated data utilization, etc.), here a case example will focus on one innovative component to data validation by the HHT registry.

Variable scope is one way to organize data validation techniques which can be separated into five major categories:

  1. Single Variable

    1. E.g., Systolic blood pressure cannot be higher than X

  2. Multiple Variable

    1. E.g., if cancer history, diagnosis date is required.

  3. Across Form, Single Visit

    1. If female, pregnancy history form is required

  4. Across Visits

    1. Maximum resting heart rate change in two weeks cannot be greater than X%

  5. Population Constraints

    1. 30 minute delayed memory recall cannot be higher than 90% in dementia population

Data validation across these categories is an important step in registry setup. However, even with appropriate validation in place, conveying errors in a timely, easily digestible manner to data entry staff can be a challenge. Timing is one complicating factor within the clinical setting, as staff often must navigate interruptions, start and stop data entry, look-up data values from multiple sources / devices, etc. Thus, one challenge is how best to re-engage and direct data entry staff toward what needs to be done.

The HHT registry uses a descriptive text-based approach to inform data entry staff of validation errors. The errors are aggregated into at “ToDo” table, each error has two components (see below):

  1. A sentence describing the item and form on which the error occurred and the nature of the violation.

  2. A “ToDo” note indicating what needs to be done to correct the error.

Once the errors are corrected (i.e., no errors remain), a note appears indicating data entry is all caught up. The report is available on demand to ensure immediate feedback of data entry status. Note this approach was also beneficial for staff training during site onboarding since it familiarized staff with a variety of different types of errors, where to locate errors, how data entry is interconnected, and the specific tasks / decisions needed to rectify violations.

In sum, it should be clear HHT and other multi-specialty registries have unique challenges to address for successful implementation, in addition to common issues encountered by all registries. Well-designed registries facilitate both the collection and the dynamic use of data by staff and patients. Look for other blog articles covering the results of this ongoing process.

Recent Posts

See All


bottom of page