Visualizing Disease States: Making The Rabbit Appear!
Fast, accurate comprehension of all patient-related data is a common challenge in disease management. Plots, charts, and graphs quickly convey large amounts of data-driven information. However, patient registries have several characteristics that limit the effectiveness of these tools. This article explores alternative ways to visually display disease-related data. Examples are provided using a chronic inflammatory disease population (i.e., IgG4) and the Responder Index instrument.
Magic In That Hat
Like a good magician, disease-related data are reluctant to give up secrets on how to make relationships appear from a hat. The magic involves several steps, each deserving of an extended discussion. The topic of this article highlights the last step, visualization techniques that aid in the interpretation of data. In addition to general design principles (e.g., layout aesthetics), this topic typically centers around a software package or plotting library (e.g., ggplot2, base R, Matlab, Xfig, etc.) and describes configuration options for various plots, charts and graphs (herein referred to as PCaG). Although PCaG are a mainstay for visualizing clinical research data (e.g., “Data Visualization: A Practical Introduction” by Kieran Healy), there are several scenarios that push past the limits of these tools, requiring alternative approaches to visual display.
Patient registry data repositories are one such example as they have several characteristics that call for different approaches to data visualization besides PCaG, these characteristics include:
The content domains typically extend well beyond that captured by the electronic medical record system or research projects (e.g., disease metrics, treatment, patient ratings, clinical workflow, clinical decisions, seminal events, etc.). Because the overlay of multiple factors is often key, the traditional toolset of PCaG symbols (e.g., lines, bars, boxes) are easily exhausted and/or become difficult to differentiate within a combined visual display.
The number of data elements can be large, sometimes thousands of fields. As such, the number of PCaG quickly explodes and/or become densely packed making interpretation cumbersome.
The data definition of fields often embeds location information (e.g., coordinates for a deep brain stimulation lead, tumor location [see here], etc.). Thus, not only is there a need to display values, but also spatial location (e.g., an image overlay) which is not readily accommodated in PCaG (i.e., other than simply grouping information together).
Most patient registries capture data over time which is generally a good fit for PCaG. However, display options are limited for handling time-related factors commonly found in patient registries such as:
- Conditional Display
Whether to display a field or not may be contingent on its value or some combination of other field values. For example, it is common to have numerous fields defined as the presence vs. absence of a factor (e.g., cancer history, positive vs. negative test results, etc.). Since most are absent for any given subject, the display of absent results typically just introduces noise or unnecessary clutter.
Time Span Variation
The time span around a variable value doesn’t always coincide with the date collected (e.g., pain over the past 4 weeks). Relatedly, a date value for a given factor may indicate historical, present or future relevance (e.g., date of organ damage).
The display of fields may be unrelated to an expression, but instead require the layout of constituent parts in alignment with the data definition (e.g., seizure area of origin lead location: see below).
The visual overlay of multiple factors is a common approach to facilitating clinical decisions. With each additional factor, there is an increased likelihood of scaling differences across data elements (e.g., variable value range, level of measure, etc.). Three or more scale variations across data elements tends to stretch the limits of traditional PCaG (e.g., two y-axis scales).
Treatment decision markers are key elements to visually display (e.g., treatment onset and duration, name of therapeutic agent, etc.). Making these intermittent, non-numeric events standout visually, in combination with other data elements, is limited using a traditional PCaG approach.
Patient registries often capture information about clinical workflow (e.g., workup events) and/or the movement of patients through various components of the healthcare system (e.g., hand-offs, order execution, referrals, etc.), as well as the status of these events. These data often benefit from non-traditional visual representation (e.g., icons of various stages) and/or placement onto a mapped image (e.g., floor map), neither of which is well supported by PCaG.
Taken together, the visual display of patient registry data (e.g., see examples here) involves characteristics that stretch beyond the utility of PCaG. So what’s the secret behind pulling relationships out from a hat? A brief review is in order.
Reaching In And Grabbing What?
Before risking hand and limb, the initial step is to know what’s in the hat. The main ingredient of course being the contents of the patient registry database. Certainly the specific data definition of each field is key, however, there are a number of common characteristics that help determine the display of data elements irrespective of the target disease (e.g., cardiac disease, diabetes, etc.). Briefly, these include:
- Level of Measurement (see here)
- Nominal: Named categories only (e.g., Sex: Male / Female)
- Ordinal: Ranked categories (e.g., tremor ratings)
- Interval: Numeric and difference between values consistent (e.g., IQ)
- Ratio: Numeric with an absolute zero point (e.g., tumor size)
- Field Type – valid variable values for a given field (e.g., numeric, date, text, etc.)
- Timing (e.g., daily, weekly)
- Self-report (e.g., patient, caregiver, healthcare provider)
- Machine / device generated (e.g., labs)
- Disease Metrics
- Target Level
- Genetic / Proteomic
- Physical characteristics (e.g., size)
- Physiological functioning (e.g., lab values)
- Symptoms (e.g., tremor)
- Function (e.g., walking)
- Health-related Quality of Life
- Seminal events and dates
- Administrative (e.g., physician providing care)
- Clinical workflow (e.g., scheduled care events and results)
- Treatment (e.g., medications)
- Volume (i.e., the expected number of data points over time)
All patient registries involve some combination of the above factors, which in turn are the magic ingredients to the trick of pulling something out of the hat.
Seeing Is Believing
It doesn’t always have to be a rabbit, but there needs to be something to grab. Effective data visualization designs are framed by a vision of what’s helpful within a given clinical setting. Said differently, all patient registries are both over- and under- inclusive depending on the aims. The aims are the various animals hidden in the hat. Thus, the many decisions about display aesthetics are tied directly back to the “end-in-mind” aims. Common questions that unearth aims include:
What are the main variables of interest?
What is helpful to learn?
What questions will the display answer?
How are the data related / best grouped together?
What variable combinations make sense?
What decisions need to be made?
What information is needed for each decision?
What are the most / least common issues?
Are there any safety issues, critical events, etc.?
What information would help with clinical work-flow?
With these questions in hand, it’s time to pull off the trick.
Pulling Out The Rabbit!
The materials that hold together the rabbit to be grasped can be quite diverse. To clarify the construction zone, translating data into pixel-level graphical representations or animations will not be the focus. Also, mapping data onto an image is handled in a separate blog article. Instead, this discussion is centered around a grid (i.e., a table using HTML) with time represented from left to right, with the main building materials being 1) the contents of each cell and 2) the cell’s attributes (see below). Together these include:
SVG (Scalable Vector Graphics) is used to define vector-based graphics which can be embed into HTML pages. There are several standard shapes available (e.g., Circle, Ellipse, Rectangle, etc.) and any number of other shapes can be created. In addition to a one-to-one mapping between a shape and factor (e.g., circle = stoke onset), multiple factors can be represented using various display options. For instance, a single shape for stroke (i.e., a circle) can represent two factors merely by using an outline (see below). Hint: Consider visual mnemonics to help minimize the learning curve. In this case, the red outline being associated with blood bursting from a vessel. Shape orientation (i.e., except circles 😊) can also be used as an separate indicator (e.g., triangle pointing right vs. left).
The size is an intuitive, quickly grasped mechanism to distinguish between various levels of a given factor. Size fits best with ordinal, interval and ratio level variable values. Note, however, size variation (e.g., based on an expression) often complicates display alignment. In such cases, range limits or data transformations (e.g., logarithmic) are often helpful.
Color naturally draws attention and makes interpretation of results immediately apparent. Various combinations of hue, saturation, and lightness can accommodate numerous factors, both discrete and continuous. Hue is the true color (e.g., red, orange, yellow, green, blue). Saturation is the purity or colorfulness of the given hue. Lightness is the contrast of a specific hue from dark to light.
The background and cell outline color.
The number of factors represented with a visual display can be increased multi-fold by overlaying items (see example below). Note, overlay can also take advantage of position to convey information. For instance, left/right, medial, superior, inferior, etc. positioning can be used to indicate various states.
In addition to SVG, any number of images and icons can be used to represent various factors. Being creative is key as nearly anything could be used. For instance, staff pictures of varying sizes depending on number of procedures administered.
There is a broad range of cell contents that could be used to represent various disease attributes (e.g., symbols, numbers, etc.), whatever makes sense for the context and/or can be dreamed up. The most common being numbers. The number directly correspond to value (e.g., systolic blood pressure) or may represent the end result of an expression (e.g., number of seizures). Often some sort of aggregation and/or selection method is used covering the cell’s defined begin and end date range (e.g., over a defined time range the: average value, highest / lowest value, most recent, event count, etc.).
Example: IgG4 Responder Index
Several articles are planned covering the use of the above concepts, so only a brief example is provided here. IgG4-related disease involves multiple organs, each with a diverse set of inflammatory-related problems. The Responder Index (RI) was developed to assess disease activity over time and is often one of the measures used in IgG4 patient registries. The RI assesses multiple organs/sites, each containing ratings and indicators of disease (see below). Due to the breadth of information collected and consistent scaling across body systems, the RI fits well with a number of data visualization techniques.
The graph below shows the five RI summary scores over time for an example case.
Below is an image of the first four gird lines from an example patient. There are a couple of things to note:
- It is easy to spot incomplete data entry via solid gray circles to indicate missing data.
Trigger Onset Dates
- The display area for a given date value can be up-to, at, or going forward. For example, once organ/site damage occurs, it is always present. As seen for Meninges, damage onset was early 2016 and is shown continuously from that point forward (i.e., light-red, large circle). Note, a dark red was used to indicate whether the damage was symptomatic.
- An inner circle was used to indicate an additional factor.
- Different colored borders were used to signal the presence of various factors.
- Each column in the figure represents about 2 ½ months during which any number of clinic visits may occur. Thus, decision rules are required to determine what to display in the case of multiple visits (e.g., the last clinic visit within the range).
- Any number of additional factors and characteristics (e.g., use of different shapes) may be added. The key is to engage target consumers (e.g., clinical staff, patients, etc.) in determining what would be helpful to see.
This is the first in a series of other articles examining visualizing disease states. The next in the series addresses how to manage heterogeneous data.