Much (most?) EC/GP research presents results in the form of aggregate statistics that (hopefully) show that something is happening, but rarely illuminate why. So we can have papers that show that using lexicase selection is a much better idea than using aggregate fitness on a variety of problems; this is valuable, but doesn't tell us why. This is particularly frustrating because in EC research we potentially have all the data because we write all the code. But we typically throw the vast majority of that data away, publishing some summary tables or plots.
As an alternative we're exploring collecting a lot of the potentially relevant data, especially information on parent-child relationships, in a graph database tool (currently Neo4j) so that we can dig through runs exploring how we got from Point A to Point B in a run.
It's currently still very much "by hand", and there are resource/infrastructure issues that make it hard to search/process more than about 100 runs at a time. That said, we've managed to observe and explore some interesting and potentially important phenomena this way. In our GPTP 15 paper, for example, we observed that lexicase selection at least sometimes leads to "hyper-selection", where an individual receives more than half (and sometimes nearly all) of the selections in a given generation. Subsequent searches made it clear that this is common across hundreds of runs and several different problems. These numbers are reported in @thelmuth's dissertation, which also includes some nifty follow-up experiments that showed that hyper-selection doesn't seem to be all that important to the success of lexicase, i.e., what's important is which individuals are being selected, not that some are being selected a ton.
And we can draw pretty pictures from the graphs. That said, however, visualizing all this data is a major on-going issue. A particular challenge is combining data from multiple runs, where there might be similar-ish (sequences of) events, but they could happen at different points in the run, so it's hard to create visualizations that capture that commonality.