lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wjL7GG9s9Y2+u2725M+Ru=bUXnzOnXRwoSktY0fVdhhzw@mail.gmail.com>
Date:   Tue, 18 Apr 2023 12:11:51 -0700
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     dsterba@...e.cz
Cc:     "Regzbot (on behalf of Thorsten Leemhuis)" 
        <regressions@...mhuis.info>, Rafael Wysocki <rafael@...nel.org>,
        David Sterba <dsterba@...e.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Linux regressions mailing list <regressions@...ts.linux.dev>
Subject: Re: Linux regressions report for mainline [2023-04-16]

On Tue, Apr 18, 2023 at 11:20 AM David Sterba <dsterba@...e.cz> wrote:
>
> There's also in-memory cache of already trimmed ranges since last mount
> so even running discard repeatedly (either fstrim or as mount option)
> will not do extra IO. We try hard not to provoke the firmware bugs.

So we've had devices that claim to support discard, and then simply don't.

I have dim memories of people reporting IO simply stopping working
after a discard when it confused the GC logic too much.

And yes, those dim memories are from many years ago when SSD's were
new and fancy, and we had all kinds of crazy stuff going on, including
serious big SSD manufacturers that came to the kernel summit and said
that we need to do IO in 64kB aligned hunks, because doing GC was too
hard.

Those people have now thankfully gone off and done shingled drives
instead and we can mostly ignore them (although I do note that btrfs
seems to be gulping down the shingled drive koolaid too), but I'm
afraid that some of that incompetence still exists in the form of old
drives.

And some of it isn't even that old. See commit 07d2872bf4c8 ("mmc:
core: Add SD card quirk for broken discard") which is from late last
year. I'm not sure what the failure case there was (apart from the
"mk2fs failed", which I _assume_ was mkfs or mke2fs).

The real problem cases tend to be things like random USB memory sticks
etc. I think the Sandisk MMC case is not that different. A lot of odd
small embedded flash controllers that have never been tested except
under Windows or in cameras or whatever.

So discard tends to have two classes of problems

 (a) performance problems due to being non-queued, or simply because
the flash controller is small and latency can be absolutely *huge*
when it processes trims

 (b) the "it doesn't work at all" problem

and it's really that "it doesn't work" case I worry about.

We have quite a few trim-related quirks. Do this:

    git grep HORKAGE.*TRIM

to see just the libata cases. Yes, some of those are "the queued
version doesn't work". Others are just "it's not zero after trim".
Whatever. But some of them are literally "do not use trim at all".

See commit cda57b1b05cf ("libata: force disable trim for SuperSSpeed
S238"), and tell me that the sentence

  "This device loses blocks, often the partition table area, on trim"

doesn't worry you? Ok, so that's from 2015, so "old drives only".

Or how about c8ea23d5fa59 ("ata: libata-core: Disable TRIM on M88V29")
from last year:

   "While it also advertises TRIM support, I/O errors are reported
    when the discard mount option fstrim is used. TRIM also fails
    when disabling NCQ and not just as an NCQ command"

Again, that's libata - odd crazy hardware. But it's exactly the odd
crazy hardware that worries me. When the failure mode isn't "it's
slow", but "it ATE MY WHOLE DISK", that's a scary scary problem.

Hmm?

I dunno. Maybe you have reason to believe that all of these cases have
been fixed, or that some of these were caused by kernel bugs because
we did things wrong, and those have been fixed.

But the failure modes just makes me worry. From your email, it *seems*
like you think that the failures were primarily performance-related.

                Linus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ