In data science, segmentation refers to the process of dividing a dataset into subgroups or segments based on certain criteria, such as demographic variables, behavioural patterns, or other characteristics. The purpose of segmentation is to identify groups of similar individuals or entities within a larger population, allowing for more targeted analysis or marketing strategies.
Visualisation can be a powerful tool for understanding data segments because it allows you to see patterns and relationships that may not be immediately apparent from looking at raw data. Whilst an algorithm might be comfortable working with a million individuals, it's often helpful for humans to apply heuristics or 'rules of thumb' that apply to categories or types of entities.
One dimension, two dimension, three dimension, four...
At their simplest, segments represent groupings of items that share a single characteristic or dimension (for example age). Visualisations like basic histograms handle these perfectly adequately.
Segmentations based on two dimensions are also very common. Typical visualisations used to show these might be a classic 2x2 matrix, like this example for a platform which analysed financial traders communications looking for signs of fraudulent behaviour.
Other examples, like this interactive Mosaic plot for a real-estate analytics product, can help you understand the distribution of data within each segment. This can be useful for identifying outliers or anomalies that might impact your thinking.
In other situations, rather than understanding the distribution of items across segments, it might be more helpful to understand what makes the items within a segment similar.
Whilst working on the design of Innerspace, an emotion analytics platform for XR environments, it was important to understand whether the segments (either created manually or generated by the application) were based on people's underlying personality traits, their reactions to the experience as a whole or to a specific stimulus.
For example, the user might create a segment based on “Women under 30” but not yet know that there was a strong similarity in this group’s response to a specific stimulus.
We turned to a rarely used chart called a ternary plot, which provided an effective way to focus / filter on the segments that will be most relevant, such as people who had a positive reaction to content tagged to a specific location.
Ternary plots only work with three dimensions, but they capture segments that are more nuanced than binary categorisation - instead helping you understand the blend of affinity to different factors.
When exploring segments based on more than three dimensions, it often becomes necessary to introduce interactivity into visualisations.