SSD Review Format Discussion

David_Schroth

Administrator
Staff member
Joined
Apr 23, 2019
Messages
1,210
Points
113
Hi Everyone - I want to get some feedback as to what tests and things you want to see most in SSD reviews. I plan on putting together an X570 based test bench in the near future that will be able to zoom up to PCIe 4.0 speeds and start reviewing these things. I've even got a Microcenter or two nearby so I can grab some of those Inland ones to look at. A few of the things that I've already thought of -

  • The usual sequential, random, etc read/write tests
  • PCMark10 (Applications Test?)
  • Game load times (need suggestions on specific games that are 1. repeatable 2. measurable)
  • Determining if drive performance is impacted by heat (for units that do not have a heatsink installed). Add a heatsink to see if there's a performance difference, thermal IR pictures may be doable as well.
  • ...
  • Profit
 
I like the idea of game load times being benchmarked. I'd say try out Kingdom Come, it's hands down the best example i've come across of a game that really needs an SSD. Loads times, both with starting the game and after things like fast travel or sleeping change heavily, and the game also benefits by much faster asset streaming (on an HDD, it was typical for you to end up in the game before everything had fully loaded in, or be able to move throughout the game faster than assets could load, causing visual abnormalities and stuttering).
 
I like the idea of game load times, I just think it may get skewed by other factors to be a really useful benchmark across various SSD reviews.

If you use the same motherboard/CPU/drivers/game patch levels/no networking for each one, maybe... but in general your going to see everything you need to see in synthetics. It may not directly correlate to real world, but it has some relative indication between models.

Thermals are a great idea, especially with this becoming important with NVMe drives now. It doesn't apply to SSDs any more, but on spinners noise was always a big deal that often was glossed over, also spin-up time and time-to-sleep on power settings. For SSDs, something like expected lifetime in terms of Terabytes/Petabytes writes (based on amount of overprovision, NAND technology, and manufacturer recommendations), especially compared to what the drive is warrantied for.

For me personally, the biggest asset a review site can have are searchable/indexable database of their benchmarks. So I can see how that PCI4 drive benched today, and 2 years from now have some idea how it will compare with that PCI5 drive that just released without the guys having to redo shoot-outs between drives each and every time a new model comes out. It's not perfect, as factors always change, and things always evolve, but it's something I've always gained a lot of insight from.
 
I would say WoW would be a good game to measure load times. Park a character somewhere remote to measure the load in time when entering the game.
 
Boot-up shutdown times repeated.
 
For me personally, the biggest asset a review site can have are searchable/indexable database of their benchmarks. So I can see how that PCI4 drive benched today, and 2 years from now have some idea how it will compare with that PCI5 drive that just released without the guys having to redo shoot-outs between drives each and every time a new model comes out. It's not perfect, as factors always change, and things always evolve, but it's something I've always gained a lot of insight from.

That's the hope with a lot of the data that we're compiling on the back end. The benchmark for doing something with that is when we get enough ad dollars flowing in to exceed expenses, that'll be when we can invest in the development time to present it. I hope that'll be sooner rather than later, but we're nowhere near that right now.

Boot-up shutdown times repeated.

Windows is always optimizing things in the background to the extent that it may not be a repeatable test over the course of time. All tests when doing these reviews are typically run multiple times (at least 3, if not more)... hmm.
 
I wouldn't mind seeing a database benchmark, ideally both TPC-C and TPC-H like, although if I had to choose I'd pick the OLTP benchmark (TPC-C-like).
 
Sadly I've noticed not all games are equal on load times and taking advantage of the faster drives.

Most every game benefits from an SSD but the ones that Benefit from NVME speeds let alone high speed NVME speeds are fewer than I would like.

For instance.. Anthem doesn't care about NVME speed on loading. It has it's own online checks and everything else it does so the real world benefit I've noticed from running on NVME is I think minimal.

Where as games like World of Warcraft it can make a distinct difference when loading a new section of the world or in Riads and such when a lot is going on.

So I would do actually a couple of tests.

1. I would test to see what games actually utilize the speed available on today's SSD, NVME and PCIE 4x NVME drives.

Then once you have your games listed... then start your game load time and performance impact testing.

Some thoughts on games. Games like the older Mass Effect series had distinctive loading time 'buffer' zones that don't drop you out of the game but can be repeated. The elevator rides. It's a distinct improvement in the game experience while running around and hitting elevator after elevator to have the load times reduced.

Pop in... Games like GTA V have some pop in issues that could be tied to your storage performance.

On another front it would be interesting to see if where Windows places your swap drive, if that has a performance impact as compared to older drives where it was more beneficial to have your swap drive on a different drive than your games and such.

Photoshop or content creation scratch drive. Does NVME make a big difference? Some things like that could be interesting to see if you already have the software to test with.
 
There are a couple of things that absolutely need to be tested on SSDs:
  • SLC/MLC caching. Static vs. dynamic, size of the cache, etc . While the manufacturer may offer this information to the reviewer it's nice to see it actually tested, specifically with regard to my next point. AnandTech and Tom's Hardware do this, other sites rely on a file transfer (which is more "real world" actually), however I often have people baffled by a "slow" drive.
  • Drive fill. NAND-based SSDs will slow down when they're fuller but there's other aspects that impact performance. Drives should be tested at various fill rates which should be in combination with my first point. Different caching strategies mean different performance states, the 660p/665p is a good example. AnandTech tests this but probably too harshly for certain drives, see my next point.
  • Different drives should be tested with different workloads. I see YouTube reviewers showing the SX8200 Pro matching or beating the 970 Pro, for example. In any test where the MLC makes a difference, the SX8200 Pro would get crushed. Likewise the SX8200 Pro is likely faster than more expensive drives like the 970 EVO/EVO Plus and SN760 for everyday usage.
  • I think a "real world" metric - what I call "workflow" - should be tested. The problem is doing so objectively. Most commonly PCMark and SYSmark are used for this but I don't think those are sufficient. When I use my SN750 I can tell its design provides more consistent performance which for me is more comfortable. A better example would be DRAM vs. DRAM-less SATA SSD: anybody who builds systems for a living, installing updates/drivers, can feel the difference.
  • Synthetic benchmarks like AS SSD and CrystalDiskMark have their uses - for example showing the impact of software (e.g. Acronis) on the drive or finding small performance differences on chipset vs. CPU M.2 sockets/lanes. Every site does these, though. They're just a baseline.
  • Gaming is an interesting question. It can be difficult to accurate measure impact to load times and in general NVMe doesn't help much over SATA let alone between different NVMe drives. It might be better to measure the entire process (e.g. installing) and also to factor in a fuller drive - overfilling games drive is a tradition. But reads won't change much with settled/idle NAND.
  • Games part 2: certain games engines seem to work storage differently. Unity engine titles love NVMe, for example. SWTOR got my drive hotter than a full-drive write (installation). Most importantly in 2020, that dreaded word: consoles. Our games will get to the point where different drives WILL make a difference. You want to get ahead of the curve on that one.
  • Efficiency or performance per watt, especially for laptops is an important metric. Requires specialized hardware but I still feel laptops aren't tested sufficiently here.
  • Thermals - important for NVMe moving forward, absolutely. Care needs to be taken to check both flash and controller. TechPowerUp does a reasonable job here. Also takes specialized hardware.
  • Software and support should be a factor.
  • Lastly, resources and comparison as someone mentioned. Categorization. Users have trouble grasping the technical details let alone the hardware, the models, etc. Making the connection between user and hardware is the most important aspect of a review.
 
I like it overall

Game installs I think is not terrible useful. Most that I do anymore are from digital downloads and are going to be constrained from that more than anything. Besides - install is a one-time thing (unless you delete and reinstall frequently which means you need more storage)

I do think a couple of game metrics can be enlightening here, but I don’t think it needs to go in depth. Maybe as part of the “real life” metric throw a couple of standardized from desktop to save file loads in there on a few popular titles.

For me, the real life metric is everything: time to desktop on cold boot, time to load document from desktop, time for multiple docs on simultaneous load, etc... the times aren’t the important part, it’s the delta between hardware and how consistent they can do that in.

I’d be surprised if Michael at Phoronix wouldn’t have a good starting point for anything to do with benches.
 
I think responsiveness is key. I work with a ton of machines and SSDs - I can tell the difference between a DRAM and DRAM-less SATA drive when first setting up a machine, for example. With longer use I can tell between SATA and NVMe, but it's usually not drastic. SLC, even in its simulated mode with TLC drives, I can tell is faster than MLC; but my MLC drives simply perform better with anything requiring stamina. So it really comes down to SLC cache design with modern drives, combined with controller algorithms - how they handle workloads. You need to do a full install, multi-task, etc. even multiple times to induce its weaknesses. "Consistency" there but also "workflow" as I term it - the ability to reliably get an expected result in terms of performance given a regular workload. This is not consistent over the entire drive or even at the same spot on the same drive because the controller's algorithms are predictive - for consumer they are relatively reliable, but don't tell the full story.
 
I may be a bit late to the party, but you may want to consider a shootout to build the set of reference results quickly - right now you have 2 data points. A 6 drive shootout would help bulk that up a bit.

as for things to measure, I don’t know how to quantify “how does it daily drive?”. I might fire off a code test, let it run the the background chewing up 50-60mbs and then do something else on the host. I rarely reboot anymore outside of patches, but I may hibernate with Slack, Zoom, crash plan, pycharm, sql developer, and a browser with 50+ tabs open on a browser. At the end of the day, I might fire up something on steam, depending on what host I was working on. I’m not sure what is reflective of that kind of disk usage, but it can suck on a cheap ssd.
 
A 6 drive shootout would help bulk that up a bit.

We are getting there. We plan to work through the drives on our test bench and have a repository of results - it'll keep growing over time. Currently Brent is working these in between gpu/other content.

Agree the "how does it daily driver" is a good, but subjective, question to answer. Brent may be able to include some of those opinions as he's writing and starts to notice a difference between the drives. Hopefully this would lead to some sort of correlation between the test suite and daily drive feel....
 
Honestly I'd be surprised if, apart from spinner vs SSD, if there is much of a daily driver difference between drives. Even SATA vs NVMe, on "typical" use, I would think would be fairly hard pressed to discern the difference without benchmark tooks.

Not saying don't do it at all, it would be very interesting if there were a noticeable difference.
 
For drive tests.. I would suggest a speed test with the drive being JUST a storage device not an OS or swap file. So you can generate some raw throughput numbers.

Then I would do another test with a bunch of crap open (steam and various apps) where the drive is the sole storage device and test again.

I would base your seat of the pants feel on the drive doing everything.

Just my suggestion because that seems the most realistic. (I kinda dig running my entire rig off of a single large NVME drive. Well 1 TB...
 
Honestly I'd be surprised if, apart from spinner vs SSD, if there is much of a daily driver difference between drives. Even SATA vs NVMe, on "typical" use, I would think would be fairly hard pressed to discern the difference without benchmark tooks.

Not saying don't do it at all, it would be very interesting if there were a noticeable difference.
I can tell when doing actual work with my home PC with Samsung 970 pro vs roughly the same work with my work laptop with whatever cheap SATA ssd Lenovo felt like using at the time, however, it’s also mobile i5 w16GB vs OC 2700X w/32 GB. While I think it’s the drive that is the difference, there are too many other variables.
 
Become a Patron!
Back
Top