We're slipping down the semantic slope and start talking about

Recommendations

Numbers are great, but not the whole story.

If you've been reading up on the ratings distribution we have here on Brick Insights you'll know a few things: it's useful, but problematic. It only includes reviews with scores, for one. Secondly, there's a pretty hefty skew towards good. It's nice that we like our LEGO, but there are better and worse sets out there. Do all reviews recommend all sets?

Of course not. But if we're not using any quantifiable form of data (numbers) we're venturing into the muddy area of interpretation. You linguists and philosophers out there will think of this as semantics. If you're unfamiliar, this is the science of accurately interpreting the meaning of language - and it's super hard.

Luckily, we can still do something useful by painting with broad strokes. Here's what Brick Insights aim to figure out: is a review raving about this set? Is it indifferent? Or is it actually wanting you to avoid it? We assign this label to each review one of the following ways:

Recommended: reviewer explicitly recommends this set
Indifference: reviewer neither explicitly says recommend or avoid
Avoid: reviewer explicitly dislikes this set
Recommended: reviewer scored this set significantly higher than usual
Indifference: reviewer scored this set around their average
Avoid: reviewer scored this set lower than usual

Just enough to be useful. So, what does Brick Insights' allocation between thumbs up recommended, thumbs right indifference and thumbs down avoid look like?

Worry not - you can always see the detailed explanation of why a recommendation has been set by hovering over the hand on a review.

Indifference 49%

8148

17233

9480

Total: 34861 / 43589

Not all reviews are marked as recommended or not. We currently have 8728 reviews that are considered unknown recommendations. They are excluded from these calculations. In general, all reviews added 2019 and forward are tagged properly.

Isn't this pretty cool?

It certainly helps me filter between all of the sets even better. And the best part? While I'm pretty happy with the implementation I've got right now I'll tweak it as I learn even more about our community, so this feature will get better and better over time. Maybe some day I'll be able to build an AI, but for now I have to rely on a basic neural network implementation.

If you're technically and/or linguistically inclined you might already say But Linus! There are tools for textual analysis that does this programmatically! And you're right! There is! I wrote my bachelor's thesis on textual analysis, and it's an incredibly exciting area. I'm playing around with a few tools to see if I can improve on our data, but so far I haven't managed to figure out a reliable enough implementation. Stay tuned for that!

For now, my manual gatekeeping will have to do. Keep calm and brick on!