Regarding benchmarks and numbers

Regarding benchmarks and numbers

I have been meaning to write this down for quite some time now but following the latest relevant event I thought now would be the best time.

Veteran tech site AnandTech published its review of the OnePlus 3 a couple of days back. The reviewer minced no words in lambasting the display of the phone, calling it a “huge disappointment” as the phone set some of the worst scores in their tests. As a result, a lot of people now think that the phone has a terrible display.

Before I continue, I want to clarify that I have a huge amount of respect for AnandTech, as it is one of the few sites out there that really gets to the core of the matter, with some exhaustive testing from reviewers who know more about these things than most of us do. Having said that, I feel there is a fundamental flaw in their way of testing things, that doesn’t often present the whole truth to the readers. 

Anyone who reads the site would know that AnandTech is a data driven site. Everything is presented in numbers and although there is commentary from the reviewers, it is based almost entirely on the numbers. But the numbers only tell a part of the story.

Take the display tests, for example. AnandTech has a very exhaustive set of tests for the display that includes saturation sweep test, Gretag-Macbeth color accuracy test, white point accuracy test, grayscale accuracy test, and peak brightness. Ask your average YouTuber and they won’t even be able to explain what most of those mean, let alone be able to perform and interpret the results of those tests. The OnePlus 3 performed abysmally in all of them with nothing to redeem itself. The reviewer’s observations reflected that and you can feel the frustration in his writing.

Here’s the thing, though. I have the OnePlus 3 sitting in front of me here and to me it doesn’t not look that bad. I am very anal about color calibration on displays. I capture and edit a lot of photos on the phone and for me a 100% sRGB coverage or close to that is the only one that matters. None of the other color spaces are relevant at this point, regardless of how much more information they can hold. This is especially true on Android where the OS itself is limited to sRGB, with cameras that capture in sRGB and software that process in sRGB. Anything else distorts the image and throws color accuracy out the window.

The OnePlus 3 very obviously is saturated but it’s not as bad as things used to be or what AnandTech’s review would have you believe. I personally hate the oversaturated displays that everybody else seems to love and even I don’t have a problem with it. There is a slightly punch to the images but important things like skin tones still more or less look natural. Would I have preferred if the OnePlus 3 display targeted sRGB color profile? Absolutely. Is the OnePlus 3 display terrible? No.

I like numbers. Numbers are absolute, they are unbiased. If everything is working as it should, the numbers will always tell the truth. But it’s only half the truth. As a human reviewer reviewing things, the other half should come in the form of subjective commentary. Anyone can run benchmarks once you show them how it’s done, but it’s only after years of using and reviewing devices do you gather the knowledge of presenting solid subjective analysis alongside the numbers. It’s easy at times to get swayed by the numbers, even if your own senses are telling you otherwise. You tend to calibrate your internal compass against them, ignoring what your own eyes and ears are telling you. But that’s not how people use devices in the real world.

Another example of AnandTech’s obsession with numbers is in the battery life testing. They have devised a test where they have come up with a series of activities that are looped continuously on the device at fixed brightness level until the battery goes from 100 to 0. The resultant figure (in hours) is then compared against other devices that went through a similar test. AnandTech has used this method of battery testing in all of its reviews. But it’s not useful. All it tells me is that a Device X performs better or worse than Device Y. I don’t really know how long Device X lasts on its own. The figure they post is useless. As much as they’d want you to believe, they can’t replicate real world use through a test loop. The test doesn’t take into account your network strength changing as you move about during the day. It doesn’t take into consideration that your display brightness could change as you move around. Or that you could run an app like Snapchat that uses camera, data, and GPS and kills your battery. So what is the use of that information? Sure, you now know Device X is better than Device Y, but what if you have no idea how good Device Y is? What good is the comparison then?

In my reviews, I tend to prefer using screen-on time as a metric for how long the phone lasts on a full charge. It’s not very scientific, but it represents average use case consisting of things people actually do on their phones.

This is one of the reasons I don’t use performance benchmark scores. I diligently run and log all of them every time I get a new device, but that’s for my own curiosity. An average user has no need for that information and more importantly, has no way to interpret it. All they tell you is how one device stacks against another but in vacuum are absolutely useless at painting a picture of the device’s performance.

I don’t want to single out AnandTech here, as there are plenty of other sites that do this (yes, even GSMArena). But it frustrates me to see the overreliance on numbers to quantify how good or bad something is. Phones are complex things and need more than numbers to describe them properly. It’s not a graphics card where you describe the entire thing with charts and figures and that’s all the information you will ever need. Reviewers need to move beyond benchmarks and provide their own insight. Otherwise you are making it far too easy for the machines to take your job.