Samsung 990 PRO Owners Report Rapid Health Degradation

Tsing · Jan 23, 2023

There may be a problem with Samsung's new 990 PRO SSDs. According to various owners who have shared their experiences online, the drive may suffer from rapid health degradation, with one claiming that they lost 2% health in a week after just 1.8 TB of writes, while another has suggested that the SSD could lose as much as 36% of its health after less than 2 TB of writes. An editor with Neowin who received a bad 990 PRO is claiming that Samsung is refusing to acknowledge and replace the affected SSDs.

See full article...

Ranulfo · Jan 23, 2023

Perhaps shades of the 840 Evo? Maybe it is just bad firmware like that one pro model a few years back.

Denpepe · Jan 23, 2023

I had something similar happen to my OCZ drive and a firmware update fixed the issue.

Tempest · Jan 24, 2023

Could it be that the drives are misreporting their health? Or has the premature degradation of the flash memory been measured and confirmed?

As an aside, I was relieved to discover that the health of the drive owners has not been affected, as far as I can tell.

Peter_Brosdahl · Jan 24, 2023

Tempest said:
As an aside, I was relieved to discover that the health of the drive owners has not been affected, as far as I can tell.

Never know. Between GPU stuff and this, PC building is getting riskier by the day

Zarathustra · Jan 24, 2023

Ranulfo said:
Perhaps shades of the 840 Evo? Maybe it is just bad firmware like that one pro model a few years back.

I just removed an 840 EVO from my mother in laws laptop in which I installed it in 2014, plenty of health left.

Can't remember if I upgraded the firmware on it though.

Ranulfo · Jan 24, 2023

The evo's problem if I remember right was degrading tlc nand that needed samsung's magician software running fairly regularly to refresh/defrag the drive or the drive would drastically slow down, slower speeds than a hdd.

Zarathustra · Jan 24, 2023

There are a million ways a firmware bug could cause something like this to happen. Load balancing wrong, etc. etc.

Hopefully they can fix it, and quick. Samsung' Pro drives are my go-to drives. Usually they have quite some endurance to them.

I have two small 128GB Samsung 850 Pro's that were absolutely abused as L2ARC cache drives in my servers ZFS pool for 2.5 years from 2014 to mid 2016 where they saw near constant writes. Then (when I upgraded my L2ARC drives to larger ones) I moved them to video ring buffer (constantly writing and playing back TV) and a swap drive for my KVM/LXC server where both continued to see excessive writes until last year.

They were essentially hammered from early 2014 to late 2021, and I still use them in one of my laptops. They are probably on their last legs now, but **** did they take some abuse. Just look at this SMART output:

Code:

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   082   082   000    Old_age   Always       -       90696
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       370
177 Wear_Leveling_Count     0x0013   084   084   000    Pre-fail  Always       -       948
179 Used_Rsvd_Blk_Cnt_Tot   0x0013   100   100   010    Pre-fail  Always       -       0
181 Program_Fail_Cnt_Total  0x0032   100   100   010    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   010    Old_age   Always       -       0
183 Runtime_Bad_Block       0x0013   100   100   010    Pre-fail  Always       -       0
187 Uncorrectable_Error_Cnt 0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0032   079   055   000    Old_age   Always       -       21
195 ECC_Error_Rate          0x001a   200   200   000    Old_age   Always       -       0
199 CRC_Error_Count         0x003e   099   099   000    Old_age   Always       -       62
235 POR_Recovery_Count      0x0012   099   099   000    Old_age   Always       -       198
241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       251363463877

According to my math, using 512 byte sectors, 251363463877 sectors written is ~117TB, which is quite a lot to write to a 128GB SSD, MLC or not.

I just noticed the 90,696 power on hours. That's over 10 years! My 2014 start year must have been off!

Peter_Brosdahl · Jan 24, 2023

@Zarathustra , That's just plain incredible!

Zarathustra · Jan 24, 2023

Peter_Brosdahl said:
@Zarathustra , That's just plain incredible!

Yep, this is why for years I've been replying to concerned comments in threads about SSD write endurance with "it will probably be fine".

Techreport did a very good endurance test starting in 2013 and concluding in 2015.

The originals seem to be broken on their website now, but thanks to the Internet Archive we can still find them:

Introducing the SSD Endurance Experiment (20 AUG 2013)
The SSD Endurance Experiment: 22TB update (06 SEP 2013)
The SSD Endurance Experiment: 200TB update (28 Oct 2013)
The SSD Endurance Experiment: Testing data retention at 300TB (25 Nov 2013)
The SSD Endurance Experiment: 500TB update (09 Jan 2014)
The SSD Endurance Experiment: Data retention after 600TB (23 Feb 2014)
The SSD Endurance Experiment: Casualties on the way to a petabyte (16 June 2014)
The SSD Endurance Experiment: Only two remain after 1.5PB (19 Sep 2014)
The SSD Endurance Experiment: Two freaking petabytes (04 Dec 2014)
The SSD Endurance Experiment: They’re all dead (12 Mar 2015)
Samsung turned our SSD Endurance Experiment into something incredible (16 Dec 2015)

So, the 256GB Samsung 840 Pro won with 2.4PB written.

Now, these are old enough that they aren't necessarily relevant to current gen SSD's (After all, all the drives in this list are pre 3D NAND which had a huge positive impact on SSD reliability) but what it does prove is that while random failures will still occur on occasion, in most cases unless you are really abusing modern SSD's and really abusing the wrong SSD's (like using a QLC drive in cache duty), you don't have to worry about write endurance.

Zarathustra · Jan 24, 2023

Zarathustra said:
There are a million ways a firmware bug could cause something like this to happen. Load balancing wrong, etc. etc.

Hopefully they can fix it, and quick. Samsung' Pro drives are my go-to drives. Usually they have quite some endurance to them.

I have two small 128GB Samsung 850 Pro's that were absolutely abused as L2ARC cache drives in my servers ZFS pool for 2.5 years from 2014 to mid 2016 where they saw near constant writes. Then (when I upgraded my L2ARC drives to larger ones) I moved them to video ring buffer (constantly writing and playing back TV) and a swap drive for my KVM/LXC server where both continued to see excessive writes until last year.

They were essentially hammered from early 2014 to late 2021, and I still use them in one of my laptops. They are probably on their last legs now, but **** did they take some abuse. Just look at this SMART output:

Code:

=== START OF READ SMART DATA SECTION === SMART Attributes Data Structure revision number: 1 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0 9 Power_On_Hours 0x0032 082 082 000 Old_age Always - 90696 12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 370 177 Wear_Leveling_Count 0x0013 084 084 000 Pre-fail Always - 948 179 Used_Rsvd_Blk_Cnt_Tot 0x0013 100 100 010 Pre-fail Always - 0 181 Program_Fail_Cnt_Total 0x0032 100 100 010 Old_age Always - 0 182 Erase_Fail_Count_Total 0x0032 100 100 010 Old_age Always - 0 183 Runtime_Bad_Block 0x0013 100 100 010 Pre-fail Always - 0 187 Uncorrectable_Error_Cnt 0x0032 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0032 079 055 000 Old_age Always - 21 195 ECC_Error_Rate 0x001a 200 200 000 Old_age Always - 0 199 CRC_Error_Count 0x003e 099 099 000 Old_age Always - 62 235 POR_Recovery_Count 0x0012 099 099 000 Old_age Always - 198 241 Total_LBAs_Written 0x0032 099 099 000 Old_age Always - 251363463877

According to my math, using 512 byte sectors, 251363463877 sectors written is ~117TB, which is quite a lot to write to a 128GB SSD, MLC or not.

I just noticed the 90,696 power on hours. That's over 10 years! My 2014 start year must have been off!

Peter_Brosdahl said:
@Zarathustra , That's just plain incredible!

Actually, there might be some corruption on here.

I totally believe the drive writes, but the 90696 power on hours is impossible.

I verified my amazon order history. Definitely bought both of my 128GB Samsung 850's in 2014.

If I subtract the 90696 up hours from today's date, that brings me to late 2012.

Then I remembered that I removed these drives from my server in October 2021, and since then they have only seen occasional use when I've powered on my old laptop. Subtract 90696 hours from October 2021 and we land in May 2011.

At first I thought maybe I was cheated, and someone sold me used drives and I never even noticed, but then I realized that Samsung didn't even launch the Samsung 850 Pro line until July 1st 2014

So, the smart data is corrupt. (or I have a time traveling SSD) That naturally puts everything in the SMART data table in question, but I still believe the drive writes. In fact I think they might be low. These two drives saw an absolute pounding for 7 years straight.

Zarathustra · Jan 24, 2023

As a random aside, I just stumbled across my invoice from my first SSD.

SATA II 3Gbit/s

And that price!

We have come quite a ways haven't we

that MTBF turned out to be a lie. Every OCZ drive I ever had failed to survive 2 years. Evenso, the experience was so transformitive there was no way to go back to a hard drive.

It was not the sequential speeds or writes, they were a little bit faster than hard drives, but not THAT much faster.

It was the random performance / seek times that did it.

Some comparisons I did back then:

Brian_B · Jan 24, 2023

Zarathustra said:
Actually, there might be some corruption on here.

I totally believe the drive writes, but the 90696 power on hours is impossible.

I verified my amazon order history. Definitely bought both of my 128GB Samsung 850's in 2014.

If I subtract the 90696 up hours from today's date, that brings me to late 2012.

Then I remembered that I removed these drives from my server in October 2021, and since then they have only seen occasional use when I've powered on my old laptop. Subtract 90696 hours from October 2021 and we land in May 2011.

At first I thought maybe I was cheated, and someone sold me used drives and I never even noticed, but then I realized that Samsung didn't even launch the Samsung 850 Pro line until July 1st 2014

So, the smart data is corrupt. (or I have a time traveling SSD) That naturally puts everything in the SMART data table in question, but I still believe the drive writes. In fact I think they might be low. These two drives saw an absolute pounding for 7 years straight.

Sometimes they do weird things if you roll
the odometer over

Dan_D · Jan 25, 2023

Given my luck with SSD's, I'd bet on a Samsung 990 Pro dying on me inside of 90 days.

Grimlakin · Jan 26, 2023

Dan_D said:
Given my luck with SSD's, I'd bet on a Samsung 990 Pro dying on me inside of 90 days.

Good thing is you would get another one for free. Bad thing is you would have to start over with your data. Onedrive perhaps?

DrezKill · Jan 26, 2023

Peter_Brosdahl said:
Between GPU stuff and this, PC building is getting riskier by the day

I used to work for a freelance game/software testing company (mid 2000s to early 2010s). We tested software on PCs, game consoles, cell phones, whatever. I was part of a team that tested nVidia GPUs and drivers. I was often put in charge of building and setting up the test systems. Those systems would often cause us physical damage. You never knew how or when they were gonna draw blood from your flesh, but it was guaranteed that they would. In fact, we used to say that the systems would not even work properly until they had drawn blood for the day. I guess they required a daily blood sacrifice. An essential tool for working in our hardware lab: band-aids.

Dan_D · Jan 26, 2023

Grimlakin said:
Good thing is you would get another one for free. Bad thing is you would have to start over with your data. Onedrive perhaps?

The issue is that when doing motherboard reviews, the constant benchmarking of SSD's is extremely hard on them and I've never had one last more than a year give or take. Corsair drives are especially bad, but I've killed a couple Samsungs this way too. SSD's in my personal machines aren't as problematic, but I've killed a few of those too. But I use a computer far more than a person should.

Zarathustra · Jan 26, 2023

Dan_D said:
The issue is that when doing motherboard reviews, the constant benchmarking of SSD's is extremely hard on them and I've never had one last more than a year give or take. Corsair drives are especially bad, but I've killed a couple Samsungs this way too. SSD's in my personal machines aren't as problematic, but I've killed a few of those too. But I use a computer far more than a person should.

Interesting. Which Samsung drives did you lose?

As I've mentioned in this thread previously I've beaten the **** out of some of them in my server and they've just kept going for a decade.

Peter_Brosdahl · Jan 26, 2023

Good to see that I'm not the only to take a chance on an OCZ drive. I've got a 1 or 2 TB SATA III sitting around somewhere that's done well. My drives usually have a pretty easy life though since they're mostly used for gaming, hi-res audio, and some other media.

The only drives I've truly had bad luck with have been WD (2 different external USB3) and Kingston NVME gen 3 (a pair that were in RAID0 in my old MSI GT80 Titan) and I still won't use them now since both died in under 6 months of use.

Other than that, in terms of SATA III I've used Samsung, Intel, Sandisk, OCZ, and a few others I can't remember. Personally, no problems with any, at work where I deployed about 100 Sandisk (Dram-less), I had 1 or 2 fail but they were in a building that had horrific wiring/power, and even with surge protectors on the workstations I saw other equipment failures (we couldn't afford UPS on everything). So glad that office is out of that building and have not seen another failure since.

For NVMe I've used Samsung, Intel, Inland Gen3 drives, and then Sabrent and Crucial Gen 4 drives. So far, so good on all. Our current workstations are Dell (around 30 of them) with Kioxia NVMe 512 GB gen 3 OS drives and they have been champs so far (3 years now and on 24/7)

DrezKill said:
I used to work for a freelance game/software testing company (mid 2000s to early 2010s). We tested software on PCs, game consoles, cell phones, whatever. I was part of a team that tested nVidia GPUs and drivers. I was often put in charge of building and setting up the test systems. Those systems often cause us physical damage. You never knew how or when they were gonna draw blood from your flesh, but it was guaranteed that they would. In fact, we used to say that the systems would not even work properly until they had drawn blood for the day. I guess they required a daily blood sacrifice. An essential tool for working in our hardware lab: band-aids.

Did anyone trying leaving them in direct sunlight for an extended period of time? Just curious

I totally get it though. I'm usually fantastic around blades and such but man I can't believe how easy it is to get cut on components and/or cases.

Zarathustra · Jan 26, 2023

Peter_Brosdahl said:
Good to see that I'm not the only to take a chance on an OCZ drive. I've got a 1 or 2 TB SATA III sitting around somewhere that's done well. My drives usually have a pretty easy life though since they're mostly used for gaming, hi-res audio, and some other media.

The only drives I've truly had bad luck with have been WD (2 different external USB3) and Kingston NVME gen 3 (a pair that were in RAID0 in my old MSI GT80 Titan) and I still won't use them now since both died in under 6 months of use.

Other than that, in terms of SATA III I've used Samsung, Intel, Sandisk, OCZ, and a few others I can't remember. Personally, no problems with any, at work where I deployed about 100 Sandisk (Dram-less), I had 1 or 2 fail but they were in a building that had horrific wiring/power, and even with surge protectors on the workstations I saw other equipment failures (we couldn't afford UPS on everything). So glad that office is out of that building and have not seen another failure since.

For NVMe I've used Samsung, Intel, Inland Gen3 drives, and then Sabrent and Crucial Gen 4 drives. So far, so good on all. Our current workstations are Dell (around 30 of them) with Kioxia NVMe 512 GB gen 3 OS drives and they have been champs so far (3 years now and on 24/7)

Did anyone trying leaving them in direct sunlight for an extended period of time? Just curious I totally get it though. I'm usually fantastic around blades and such but man I can't believe how easy it is to get cut on components and/or cases.

Wow, they still make OCZ drives?

I remember Toshiba buying out their asserts when OCZ filed for bankruptcy in ~2013. For a little while after that there there were new OCZ branded drives by Toshiba, but I don't remember hearing anything about them in many years. I assumed Toshiba had retired the OCZ brand.

I've owned more SSD's than most people. My take is as follows:

OCZ: (n of ~12 over the years) Every original OCZ drive I owned performed very well for its time, but died within 2 years. I had a bunch of them too, so this is not some anecdotal N of 1 type of thing. Once Toshiba took over the brand, they appeared to become more reliable. When I RMA'd the last OCZ drive I bought (a Vertex 4 if memory serves) when it died after about a year and a half, I got a Toshiba made OCX Vector as a replacement and it is still with me to this day.

Samsung: (n of ~16-20 over the years) All Samsung drives I've owned have been absolutely bulletproof, and performed well as well. They tend to be behind a little in the race to the fastest sequential numbers, but they tend to be ahead in the 4k random / IOPS race which is what really matters for good every day performance.

Intel: (n of ~8 over the years) Their SSD's have been very reliable for me. Performance has been good too. Their Optane drives beat anything out there from both performance and reliability perspectives. Sadly they are no longer being made. The regular Intel SSD's business unit was sold to SK Hynix last year. Time will tell if the Hynix drives continue this trend.

Sabrent: (n of 1 or 2 depending on how you count it) I've only owned one of these, a 2TB Sabrent Rocket 4.0 (well, two if you count the RMA replacement). I wanted to buy a Samsung or Intel Drive, but they didn't have any Gen4 drives on the market yet in 2019, so I decided to go out of my comfort zone (Intel and Samsung) and I got my peepee slapped for the trouble. The Drive is probably just a rebranded reference Phison E16 drive, explaining why it was one of the first Gen 4 drives on the market. The one I did own pulled an OCZ and just died randomly after about a year and a half. No warning, nothing. Just wasn't recognized one day. No opportunity to pull data off of it, just like my old OCZ experience. To add insult to injury Sabrent fought me every step of the way on the warranty, trying to decline it because I didn't register my drive within an arbitrary number of days of purchase. I acted like a total Karen and eventually got it replaced, but I don't think I'll buy anything Sabrent again, even if this was just an N=1.

Inland: (n of 13) These are Micro Centers house brand. From my reading the Inland Premium drives are great. Just to be confusing they have other drives that also start with P (Inland Professional, Inland Platinum, etc. These are reportedly not as good) I have 12x Inland Premium m.2 drives in my server, and they are holding up very well, and perform well too. They are probably just rebranded reference Phison E12 Gen 3 drives.

Other: I've also owned two other brands, one of each, a small 64GB Sandisk drive (used for an Linux/Kodi HTPC box) and a small 32GB Bi-Win m.2 SATA drive which I use in my pfSense router build. I've considered replacing this last one with an NVMe drive because it takes like 3 minutes to boot and come online, and I suspect it might be an SSD issue. Either that or pfSense just boots slowly.

So that's my entire SSD history right there.

Samsung, and Intel have been fantastic (but I have also been smart about it and matched "Pro" and "Optane" drives to high write uses, and Evo and other lesser drives to low write uses)

Inland Premium looks promising, but it has only been about 1.5 years thus far. Time will tell.

The rest have been anywhere from barely OK to kind of meh, and even outright horrid.

Edit: ****, I've bought a lot of SSD's over the last decade to 12 years.

Samsung 990 PRO Owners Report Rapid Health Degradation

The FPS Review

Sort-of-Regular

FPS Regular

Quasi-regular

Moderator

Cloudless

Sort-of-Regular

Cloudless

Moderator

Cloudless

Cloudless

Cloudless

FPS Enthusiast

Administrator

Forum Posting Supreme

FPS Junkie

Administrator

Cloudless

Moderator

Cloudless