Visualize Data with a Chord Diagram

Krishna Rajan, Erich Bloch Chair, Department of Materials Design and Innovation, University at Buffalo: The State University of New York

In this working paper, you’ll learn what chord diagrams are, why they’re useful, and how to interpret them. As an example, we’ll highlight a chord diagram we created to show how different materials are used in solar cell manufacturing.

A visualization tool that shows relationships

Figure 1

Researchers use data to solve problems, but just having the data isn’t always ideal. For example, data in a table is helpful, but your brain has to interpret it in order to determine which connections stand out. It’s often much easier to draw initial conclusions from visuals, and then dig deeper into the data.

A chord diagram is a complex visualization tool used to show data for multivariate relationships (i.e., two or more items). The data for a chord diagram typically comes from an adjacency matrix (Figure 1) or a Pandas DataFrame. In our case, we’re studying relationships between two categories of materials (perovskites and solvents) and how they’re used together in solar panel manufacturing. 

Figure 2

A chord diagram starts with determining what you’re studying, and then arranging those items in a circle. In our chord diagram, we’ve arranged the most frequently reported perovskites and solvents in a circle, with all of the perovskites on one side of the circle, and all of the solvents on the other (Figure 2). Since a chord diagram is a circle, the order of items around the circle doesn’t matter, but grouping like items will make it easier to see relationships between categories. 

Nodes are connected by chords

Each item on the circle is called a node, and each node is assigned a unique color to help distinguish it from the others. Each node is also represented by an arc, which corresponds to the frequency of that node; a longer arc means that the item occurs more often (Figure 3). For example, MAPbI3 is the most commonly reported perovskite in our data, so it has the longest arc of all the perovskites. In our chord diagram, items are arranged in order of frequency within each category (which makes it easier to see some of the relationships), but that’s not necessary.



Figure 3

Every relationship between two nodes is represented by a line drawn between those nodes. For example, MAPbI3 (a perovskite) and DMF (a solvent) are used together in manufacturing, so there are lines connecting these nodes (Figure 4). Each line can represent one or more relationships between those nodes (the exact ratio should be consistent throughout the diagram). The more lines there are, the stronger the relationship is.



The individual lines combine to make ribbon-like structures called chords (hence the name of the chord diagram). Each chord, although it appears as a single unit when you zoom out, is actually a group of individual lines connecting two nodes. 




Figure 4



Each node can have multiple chords. Together, the chords show the many different relationships between all of the nodes in the diagram. For example, each of the chords attached to the MAPbI3 arc represent a relationship between MAPbI3 and a solvent (Figure 5). As you look at them, you can see that thicker chords represent stronger relationships (the perovskite and solvent are used together more often, which you can confirm with the data from the table in Figure 1), and thinner chords mean that the perovskite and solvent aren’t used together as much.



Figure 5



In a chord diagram, the color typically emanates from only one of the categories. Here, the color emanates from the category of perovskites—not the category of solvents—which makes it slightly easier to focus on the perovskite usage. For example, MAPbI3 is represented by dark blue, so the MAPbI3 arc is dark blue, and serves as a base for the MAPbI3 dark blue chords to emerge from and connect to solvents throughout the diagram.

Putting it all together to visualize relationships

Once all of the chords are in place, you can quickly visualize all of the relationships. For example, in our final chord diagram (Figure 6), you can see which materials are used most often, as well as which materials are used most often in combination. In an interactive chord diagram, you can highlight each node (and its chords) individually, to study these relationships. We can even add toxicity information to see which materials (and combinations) are the most toxic.



Figure 6

Compare this to the original data table (Figure 1), and you’ll see that it’s the same data—but the chord diagram makes it much easier to quickly study all of the relationships, and ask questions like:

  • Which perovskite-solvent combination is used most often? 

  • Which solvents aren’t used as often, but are typically used with a specific perovskite? 

  • Which perovskites are rarely used, and perhaps could be replaced with another more common perovskite?



With a chord diagram, you can easily see connections, which makes it easier to determine which questions to ask, and to start finding answers.