linux-kernel - Re: Race to power off harming SATA SSDs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170508092846.GA11029@amd>
Date:   Mon, 8 May 2017 11:28:47 +0200
From:   Pavel Machek <pavel@....cz>
To:     David Woodhouse <dwmw2@...radead.org>
Cc:     Tejun Heo <tj@...nel.org>,
        Henrique de Moraes Holschuh <hmh@....eng.br>,
        linux-kernel@...r.kernel.org, linux-scsi@...r.kernel.org,
        linux-ide@...r.kernel.org, Hans de Goede <hdegoede@...hat.com>,
        boris.brezillon@...e-electrons.com, linux-mtd@...ts.infradead.org
Subject: Re: Race to power off harming SATA SSDs

On Mon 2017-05-08 08:21:34, David Woodhouse wrote:
> On Sun, 2017-05-07 at 22:40 +0200, Pavel Machek wrote:
> > > > NOTE: unclean SSD power-offs are dangerous and may brick the device in
> > > > the worst case, or otherwise harm it (reduce longevity, damage flash
> > > > blocks).  It is also not impossible to get data corruption.
> >
> > > I get that the incrementing counters might not be pretty but I'm a bit
> > > skeptical about this being an actual issue.  Because if that were
> > > true, the device would be bricking itself from any sort of power
> > > losses be that an actual power loss, battery rundown or hard power off
> > > after crash.
> >
> > And that's exactly what users see. If you do enough power fails on a
> > SSD, you usually brick it, some die sooner than others. There was some
> > test results published, some are here
> > http://lkcl.net/reports/ssd_analysis.html, I believe I seen some
> > others too.
> > 
> > It is very hard for a NAND to work reliably in face of power
> > failures. In fact, not even Linux MTD + UBIFS works well in that
> > regards. See
> > http://www.linux-mtd.infradead.org/faq/ubi.html. (Unfortunately, its
> > down now?!). If we can't get it right, do you believe SSD manufactures
> > do?
> > 
> > [Issue is, if you powerdown during erase, you get "weakly erased"
> > page, which will contain expected 0xff's, but you'll get bitflips
> > there quickly. Similar issue exists for writes. It is solveable in
> > software, just hard and slow... and we don't do it.]
> 
> It's not that hard. We certainly do it in JFFS2. I was fairly sure that
> it was also part of the design considerations for UBI — it really ought
> to be right there too. I'm less sure about UBIFS but I would have
> expected it to be OK.

Are you sure you have it right in JFFS2? Do you journal block erases?
Apparently, that was pretty much non-issue on older flashes.

https://web-beta.archive.org/web/20160923094716/http://www.linux-mtd.infradead.org:80/doc/ubifs.html#L_unstable_bits

> SSDs however are often crap; power fail those at your peril. And of
> course there's nothing you can do when they do fail, whereas we accept
> patches for things which are implemented in Linux.

Agreed. If the SSD indiciates unexpected powerdown, it is a problem
and we need to fix it.

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html