This week we are proudly announcing the latest feature of our exploratory data analysis platform that can be a must-use tool for visualizing multivariate ordinal or numeric data. It is a powerful graphic tool when you have to make a comparison between a range of products with the same attributes, like comparing the specifications of different products or investigating customer habits.
Definition
In parallel coordinates plot, or simply parallel coordinates, each quantitative variable is represented by an equally spaced vertical line. All the other lines that connect axes are separate values. The scaling of the axes can be different as each variable has its own measurement unit or it can be normalized to keep the uniformity.
The order of the axes is crucial because it can strongly affect the readability of the chart. One practical reason for this is that the relationship between contiguous variables is easier to recognize, then for non-contiguous variables. By reordering the axes you can easier discover patterns or find correlations in your given data.
Usability
Similar to Scatter Plots, the values in parallel coordinate charts are not aggregated, so each line is a representation of, for example a transaction or other numeric data. This can cause a so-called spaghetti chart effect, where the number of values and overlapping lines damage the legibility.
By highlighting the different groups or decreasing the sample size in your chart, you can easily evade this phenomenon and filter out the noise. This method is called brushing, when you colorize only a specific group of lines in your chart while you mute the others.
Example
Let’s just jump straight into the app and find a quick example. First of all, download a sample dataset. For this tutorial, we are using the famous Iris flower dataset that compares the attributes of different flower subspecies.
After loading the dataset, first, have a quick look at your data. You can find six variables (columns) here. Four of them are numeric: sepal length, sepal width, petal length, and petal width. In order to compare these four attributes first click on the Suggested Charts then choose the four numeric variables and click on the arrow in the middle of the generated Parallel Coordinate chart.
Now you can see two things: although this set contains only 150 records, finding relation and patterns is complicated at first glance. However, if you arrange the variables a.k.a. the axes by clicking and changing them, you can now see the progress. To change the order you can drag and drop the axes on the left sidebar.
By altering the order of attributes and coloring by attribute Species, finding those patterns is now much less work.
Now it is only the brushing left, so click on the colored bars above the chart to turn the unwanted noise (in this case the data of two other subspecies) off or click on an axis by holding down your left mouse button and highlight only a range of lines.
Feel free to share your results by clicking on the Share button in the top right corner of your chart or Save the chart for later when you need it.
Summary
Parallel coordinates plot is a wonderful graphic tool used in exploratory data analysis for comparing multiple numeric and ordinal variables. As all the values are represented by lines, the scaling and the order of the axes have high importance.
In order to increase readability, a technique called brushing is applied on the chart. In this way, the unnecessary information can be muted and you can concentrate on the relationship between a specific group of data.