Close

Visualizing Disease States: Slaying Heterogeneous Data

Part II

(Part I here)

Author: John Putzke

Summary

This is the second in a series of articles examining nontraditional methods (i.e., excluding plots, charts, graphs) for visualizing disease states.  Heterogeneous data is a common and challenging issue for nearly all types of visual displays, irrespective of method used.  Heterogeneity can be defined along several different dimensions, some of the most frequent being content domain, measurement level, collection timing (e.g., daily, weekly) and methods (e.g., self-report, device), etc.  This article describes the components of heterogeneous data and strategies for managing these data in visual displays of disease states.

A Formidable Guardian

In Greek mythology, the golden apple tree is guarded by Ladon, a terrifying serpent with 100 different heads and voices.  In a similar manner, an equally frightening serpent guard’s visual disease states that have a ‘golden’ clinical utility, its name is heterogenous data.   Certainly, the multi-pronged nature of discrepant data is an inherent issue to address when creating useful disease state images.  To harvest these golden apples, the best tactic isn’t to lower one’s fear in the face of the serpent, but to become more courageous.  Since imagined serpents are always more intimidating the real ones, best to start by looking directly at the issues heterogenous data present.

The topic of heterogeneous data generally involves mathematical concepts (e.g., measurement level, variable value distribution) and the issues are framed around data analysis (e.g., what analysis to select?, are any assumptions violated?, etc.).  However, when it comes to visualizing disease states, the discussion needs to expand to include any data characteristic that may influence its layout and/or display. 

Although not exhaustive, the main components of heterogeneous data that need to be brought together when creating disease state images are presented below.  Each component is deserving of a separate discussion on its potential influence on the resulting visualization.  For here, it is only important to note that these may be either strengths or weakness, and that this article will focus on addressing weakness related to creating visual disease states.

Measurement Level (see here) – What do the numbers represent?

  • Nominal: Named categories only (e.g., Sex: Male / Female)
  • Ordinal: Ranked categories (e.g., tremor ratings)
  • Interval: Numeric and difference between values consistent (e.g., IQ)
  • Ratio: Numeric with an absolute zero point (e.g., tumor size)

Field Type – What are the valid variable values for a given field (e.g., numeric, date, text, etc.)?

Distribution – How are the variable values distributed?

Inter-Relationships – What are the relationships between variables?

  • Range along continuum from Independent to Dependent

Content Domain – What is the assessment target?

  • Disease Metrics
    • Target Level Outcome or Mechanism
      • Genetic / Proteomic / Lipidomic / etc.
      • Physical characteristics (e.g., size, spatial information)
      • Physiological functioning (e.g., lab values)
      • Symptoms (e.g., tremor)
      • Function (e.g., walking)
      • Health-related Quality of Life
  • Administrative (e.g., physician providing care)
  • Clinical workflow (e.g., scheduled care events and results)
  • Treatment (e.g., medications)

Granularity – What level of detail or conceptual nesting does the data definition include?

  • e.g., Pain (overall vs. frequency – intensity – duration)

Priority – What is the variable’s relative importance?

Collection Methods

  • Timing (e.g., daily, weekly, etc.)
  • Source
    • Subjective
      • Self-report (e.g., patient, caregiver, healthcare provider)
    • Objective
      • Machine / device generated (e.g., labs, steps)

Why Slay the Serpent?

Leaving aside fulfilment of the archetypal hero myth, why seek to slay the serpent of heterogeneous data?  After all, it’s easiest to just display the data “as-is” with a straight-forward visual transformation (e.g., red = abnormal range).  If each visual transformation is merely treated independently, a couple of key problems arise.

Mosaicism – In genetics the term mosaicism refers to the expression of a trait (i.e., phenotype) within a single organism that has different genes (i.e., genotype).  Mosaicism often results in traits expressed in seemingly random patterns (e.g., skin mosaicism).   Same issue occurs when creating disease state images if the underlying variable values and fields are treated independently.  That is, there is more likely to be inconsistencies across variables in the:

  • meaning of visual representations (e.g., red may mean ‘abnormal’ vs. 99th percentile across two different variables)
  • number of mapped categories (e.g., High/Low  vs.  High/Medium/Low)
  • arithmetical transformations used (e.g., standard score conversion)
  • other characteristics

Crowding – Each variable in a disease state image requires a separate, physical space and location.  Thus, if variables are handled independently, rather than in combination with other variables or fit together into a larger plan, the overall image quickly becomes crowded and difficult grasp. That is, there is a positive relationship between the number of independently handled variables and the complexity of the image. 

Taken together, mosaicism and crowding tend to make disease state images more complex and difficult to comprehend.  When done well, disease state images are intuitive, quickly grasped, informative and actionable.  A good litmus test for a “golden” disease state image is the extent to which it’s incorporated into day-to-day clinical practice and decision making.  Indeed, the utility of golden images is the reason to slay the serpent. 

Slaying the Beast and Harvesting Golden Apples

Because of the wide range of disease specifics and needs within clinical settings, a simple recipe-like formula doesn’t work for transforming heterogenous data into a fruitful disease state visualization.  However, there are common weapons used to slay the heterogenous beast and transform it into golden images for harvesting.

The essence of nearly every weapon used is an organizing principle that promotes comparisons across variables and/or consolidates information.  In this way a single visual transformation can be used across multiple variables.  Typically, the principle involves an arithmetic data transformation, but any number of other methods can be used.  As an example of the breadth of available methods, there is an irony in mentioning that even the factors that contribute to heterogeneity may, in some cases, serve as organizing principles (e.g., display all the rank order data together).  Of the various methods for managing heterogenous data, the most common two are:

Measure Selection – A review of the rationale behind selecting appropriate independent / dependent variables is beyond the scope of this article but note that some measures incorporate the heavy lifting involved in making variable values homogeneous.  Such measures inherently minimize the number of transformation rules needed from value to visual display (e.g., the value 4 = improvement = green background). 

For example, the Responder Index (RI) is designed to assess IgG4-RD activity across multiple organs and sites (14 plus optional other sites; e.g., skin, lungs, thyroid, etc.).  Such diversity in measurement targets can lead to considerable data definition heterogeneity.  However, the RI uses the same four dimensions to assess each organ/site and the variable values within each dimension have the same meaning (see below).  By using a consistent definition across outcome targets, a single rule within each dimension can be used to convert values to a visual display.

Display Attributes – Converting variable values to Display Attributes is an effective mechanism to help users quickly consume a large amount of data, particularly as compared to displaying just numbers.  Using this technique, variable values are linked together with visual display attributes (e.g., shapes).  Although a transformation of variable values is not always involved, this technique most frequently involves mapping discrete values onto display attributes.  For example, transforming systolic blood pressure into three categories (e.g., high, normal, low) and mapping the categories to different icons.  Some of the most common Display Attributes are:

  • Color (e.g., red = ‘Poor’, etc.)
  • Shapes (e.g., circle = ‘Low’, triangle = ‘High’, etc.)
  • Icons (e.g., check mark icon   = presence of event)
  • Images (e.g., organize procedural statistics under a physician picture, etc.)
  • Any other unique Display Attribute (e.g., cross-hatch)

When selecting the Display Attributes to use, it is important to examine the relationship between variables and their clinical significance to 1) ensure the display attributes are visible for all possible states, and 2) select the attribute configuration that best highlights priority variable values.  The former involves planning the image layers to ensure all attributes remain visible across states and the later ensuring high priority variable values have a marked, conspicuous visual component. 

As an example, below is an image of the RI over time which leverages several Display Attributes.  As an aside, notice how the Site / Organ text is highlighted red if the organ score indicates “Worsened” on the most recent clinic visit.  This was done to help facilitate case conference discussions, thus even text outside of the main image area can be considered a Display Attribute.  The following rules were used for the RI scores:

  • Organ / Site Score (cell background)
    • 0 = Light green (Normal or resolved)
    • 1 = Green (Improved)
    • 2 = Light Gray (Persistent; Unchanged; still active)
    • 3 = Pink (New / recurrence)
    • 4 = Red (Worsened despite treatment)
  • Symptomatic (Yes) = Blue triangle
  • Urgent (Yes) = Dark red outline of cell
  • Damage (Yes) = Black cross-hatch through cell

Common Metric – One of the most effective ways to facilitate comparisons across variables with wide ranging values and characteristics is to use an arithmetic transformation to place all the scores along a common metric.  A consistent set of visualization rules can then be applied.  A number of different types of arithmetic operations can be used, some examples being normalized, standardized or percentile scores (e.g., linear transformations, see here).  

Cognitive test scores commonly use such an approach.  The example below was used to highlight again that text may be considered a display attribute.  This time the text is dynamically produced based on a conversion table.  Notice once the data is placed along a common metric, standard rules can be used to generate descriptive text applicable across several cognitive tests.

Size and Spatial Location – This topic has been briefly touched upon in other articles (see here) and a full discussion will be coming soon.  For now, note the cognitive literature shows the processing of size and spatial location generally requires minimal attentional resources, even automatic in some cases.  Thus, these two attributes have tremendous potential to convey large amounts of information and ease comparisons across heterogeneous data.

Missing and Invalid Data – Only briefly mentioned here since it’s a method not specifically designed to address heterogeneous data, the use of visual attributes is an effective means for identifying missing and/or invalid data.  Certainly, the bulk of missing/invalid data is handled through form data entry validation routines.  However, some of the more sophisticated clinical rules and/or time separated, multi-variable relationships often expressed in disease state images offer a great opportunity to incorporate an attribute to indicate missing/invalid data.