Tags:

Beyond Charts: The Essentials of Effective Data Visualization

In today’s data-driven world, the ability to communicate insights through visualizations has become an essential skill for scientists, analysts, consultants, and professionals across virtually every field. Yet, despite its importance, there are surprisingly few resources that teach us how to make compelling, informative data visualizations. Drawing from Claus O. Wilke’s excellent book Fundamentals of Data Visualization, this guide explores the key principles that separate good visualizations from the ugly, the bad, and the wrong.

Why Data Visualization Matters

Data visualization is both an art and a science. The challenge lies in getting the art right without getting the science wrong and vice versa. A data visualization must first and foremost accurately convey the data. It must not mislead or distort. If one number is twice as large as another, but in the visualization they appear roughly the same, then the visualization is fundamentally broken.

At the same time, a data visualization should be aesthetically pleasing. Good visual presentations enhance the message of the visualization. When a figure contains jarring colors, imbalanced visual elements, or other distracting features, viewers will find it harder to inspect and interpret the figure correctly.

The difference between good and bad figures can be the difference between a highly influential or an obscure paper, a grant or contract won or lost, a job interview gone well or poorly. Whether you’re a scientist preparing research papers, an analyst creating business reports, or a consultant pitching to clients, the quality of your visualizations directly impacts how effectively your message is received.

Developing “Eye” for Visualizations

Experienced editors talk about “ear” the ability to hear internally, as you read a piece of prose, whether the writing is any good. Similarly, when it comes to figures and visualizations, we need “eye” the ability to look at a figure and see whether it is balanced, clear, and compelling.

Just as with writing, the ability to see whether a figure works can be learned. Having eye means being aware of a collection of simple rules and principles of good visualization, and paying attention to little details that other people might overlook. This is not something you develop by reading a book over a weekend; it’s a lifelong process of observation, practice, and refinement.

The Three Categories of Problematic Figures

When evaluating visualizations, it helps to categorize problematic figures into three distinct types:

Ugly figures have aesthetic problems but are otherwise clear and informative. They might use garish colors, inconsistent fonts, or cluttered backgrounds, but the underlying data message remains intact.

Bad figures have problems related to perception. They may be unclear, confusing, overly complicated, or deceiving. A bad figure might use misleading scales or distort proportions in ways that confuse the viewer’s understanding.

Wrong figures have mathematical problems they are objectively incorrect. A wrong figure might omit axis scales entirely, making it impossible to ascertain the actual numbers being represented, or it might display data in ways that contradict the underlying values.

Understanding these categories helps you critically evaluate your own work and develop better judgment over time.

The Core Principle: Mapping Data onto Aesthetics

At the heart of every data visualization is a fundamental concept: all data visualizations map data values into quantifiable features of the resulting graphic. These features are called aesthetics. Aesthetics describe every aspect of a given graphical element, including:

  • Position: Where the element is located (typically defined by x and y coordinates)
  • Shape: The form of the graphical element
  • Size: How large or small the element appears
  • Color: The hue, saturation, and brightness
  • Line width: The thickness of lines
  • Line type: Solid, dashed, dotted, and other patterns

All aesthetics fall into two groups: those that can represent continuous data and those that cannot. Position, size, color, and line width can represent continuous data values for which arbitrarily fine intermediates exist. Shape and line type, however, can usually only represent discrete data.

Understanding this mapping is crucial because it helps you make informed choices about which visual elements best represent your data. When you choose to encode a continuous variable using color, for example, you need a color scale that smoothly transitions from one value to another. When encoding categorical data, you need distinctly different colors that are easily distinguishable.

Automation Is Your Friend

One of the most important practical lessons for anyone creating visualizations regularly is this: automation is your friend. Figures should be autogenerated as part of the data analysis pipeline, and they should come out of the pipeline ready to be sent to the printer, with no manual post-processing needed.

There are several compelling reasons for this approach. First, the moment you manually edit a figure, your final figure becomes irreproducible. A third party cannot generate the exact same figure you did. While this may not matter much if all you did was change the font of axis labels, it becomes problematic when you make more substantial manual changes.

Second, if you add a lot of manual post-processing to your figure reparation pipeline, you become reluctant to make changes or redo your work. You may ignore reasonable requests for change from collaborators, or you may be tempted to reuse an old figure even though you’ve regenerated all the data.

Third, you may yourself forget what exactly you did to prepare a given figure, or you may not be able to generate a future figure on new data that exactly matches your earlier figure visually. These are not hypothetical concerns they play out regularly in real publications and reports.

For all these reasons, interactive plotting programs are generally a poor choice for serious work. They inherently force you to manually prepare your figures. Even spreadsheet software like Excel falls into this category and is not recommended for rigorous figure preparation or data analysis.

Choosing the Right Visualization for Your Message

Rather than organizing visualizations by the type of data being visualized, it’s more useful to think about them in terms of the message you want to convey. Most people think in terms of messages: how large something is, how it is composed of parts, how it relates to something else, and so on.

When visualizing amounts, bar plots are often the most effective choice because they allow viewers to easily compare values by the length of bars. When showing distributions, histograms, density plots, or box plots each serve different purposes depending on what aspect of the distribution you want to highlight. For relationships between variables, scatterplots remain the gold standard. For trends over time, line charts are typically most effective.

The key is matching your visualization type to your message, not simply defaulting to whatever chart type you’re most familiar with.

Color Choices Matter More Than You Think

Color is one of the most powerful tools in the data visualization toolkit, but it’s also one of the most commonly misused. Good color choices enhance the clarity and appeal of a visualization; poor choices can render it confusing, misleading, or outright ugly.

When selecting colors, consider these principles:

First, use color purposefully, not decoratively. Every color in your figure should serve a function distinguishing categories, representing values, or drawing attention to important elements. Decorative colors that don’t convey information only add visual noise.

Second, be mindful of colorblind viewers. Approximately 8% of men and 0.5% of women have some form of color vision deficiency. Avoid relying solely on red-green distinctions, and test your visualizations using colorblindness simulation tools.

Third, use sequential color scales for continuous data that progresses from low to high, and diverging color scales when there’s a meaningful midpoint (such as positive and negative deviations from zero).

Fourth, keep your color palette limited and consistent. Using too many colors makes it difficult for viewers to distinguish between them and creates visual chaos.

The Importance of Context and Consistency

Figures don’t exist in isolation they exist within the context of a larger document, presentation, or body of work. This means your visualizations need to be consistent with each other and with the overall design of your document.

Use consistent color schemes across related figures. Maintain consistent font choices and sizes. Use consistent axis scaling when figures are meant to be compared. This consistency helps your audience build a mental model that carries across multiple visualizations, making your entire document more coherent and easier to understand.

At the same time, don’t be repetitive to the point of monotony. While consistency is important, each figure should be optimized for its specific message. The goal is to be consistent in your visual language while allowing each figure to speak with its own voice.

Making Your Figures Memorable

The best visualizations are not just informative they’re memorable. They stick in the viewer’s mind and effectively communicate the key message long after the initial viewing.

To make figures memorable, focus on clarity above all else. Remove unnecessary elements that don’t serve the data message. This includes excessive grid lines, decorative borders, 3D effects, and chartjunk that adds visual complexity without adding information.

Highlight the most important elements. Use visual hierarchy to guide the viewer’s eye to the most significant parts of the figure. This might mean using a brighter color for the key data series, adding annotations to call out important values, or adjusting the aspect ratio to emphasize the most important pattern.

Tell a story. The best figures don’t just present data; they present data in a way that reveals a pattern, relationship, or insight that wasn’t immediately obvious from the raw numbers alone.

Continuous Learning and Improvement

The field of data visualization is constantly evolving. New tools, techniques, and best practices emerge regularly. The visualization approaches that work best today may be superseded by better methods tomorrow.

This is why it’s important to approach data visualization with a mindset of continuous learning. Expose yourself to new approaches. Pay attention to the visual and design choices others make in their figures. Be open to changing your mind about what constitutes good visualization practice.

You might consider a given figure great today, but next month you might find a reason to criticize it. This isn’t a failure it’s growth. Don’t take any single source, including this one, as gospel. Think critically about the reasoning behind visualization choices and decide whether to adopt them based on your specific context and needs.

Getting Started: Practical Next Steps

If you’re looking to improve your data visualization skills, here are some practical steps you can take immediately:

Start by auditing your existing figures. Look at them with fresh eyes and categorize each as good, ugly, bad, or wrong. Be honest about the problems you find, and prioritize fixing the “wrong” and “bad” figures before addressing the “ugly” ones.

Learn the capabilities of your chosen visualization tool thoroughly. Whether you use R with ggplot2, Python with matplotlib or seaborn, or any other tool, understanding your tool’s capabilities lets you focus on the visualization principles rather than fighting with the software.

Build a personal gallery of visualizations you admire. Study what makes them effective. Try to recreate them with your own data. This practice helps develop your “eye” more quickly.

Share your figures with others and ask for honest feedback. Most people are polite when critiquing, which means you may not hear about real problems. Seek out colleagues who will give you direct, constructive criticism.

Finally, practice deliberately. Don’t just make the same figures over and over. Challenge yourself to visualize the same data in different ways. Experiment with different aesthetic choices. Compare the results and learn from the differences.

Conclusion

Data visualization is a skill that improves with knowledge, practice, and critical self-reflection. The principles outlined here accurate data representation, appropriate aesthetic mapping, automated reproducibility, purposeful color usage, consistency, and continuous learning form the foundation of effective visualization practice.

Remember that the goal of data visualization is not to make pretty pictures. It’s to communicate information clearly, accurately, and compellingly. Every choice you make about chart type, colors, scales, labels, and layout should serve that goal.

The difference between an amateur and a professional visualization creator often comes down to attention to detail and willingness to iterate. Don’t settle for your first draft. Refine your figures until they clearly and elegantly communicate the insight you’ve discovered in your data.

Whether you’re creating your first chart or your thousandth, keeping these fundamentals in mind will help you produce visualizations that inform, persuade, and endure. Because in the end, the best data visualization is one that makes the data speak for itself and lets the story within the numbers be heard clearly by anyone who looks at it.

Leave a Reply

Discover more from Data-on-the-move

Subscribe now to keep reading and get access to the full archive.

Continue reading