linux-kernel - Re: [GIT PULL] bcachefs fixes for 6.15-rc4

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHk-=wiX-CVhm0S2Ba4+pLO2U=3dY0x56jcunMyOz2TEHAgnYg@mail.gmail.com>
Date: Fri, 25 Apr 2025 13:35:37 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Matthew Wilcox <willy@...radead.org>
Cc: Kent Overstreet <kent.overstreet@...ux.dev>, linux-bcachefs@...r.kernel.org, 
	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [GIT PULL] bcachefs fixes for 6.15-rc4

On Fri, 25 Apr 2025 at 12:40, Matthew Wilcox <willy@...radead.org> wrote:
>
> I think this is something that NTFS actually got right.  Each filesystem
> carries with it a 128KiB table that maps each codepoint to its
> case-insensitive equivalent.

I agree that that is indeed a technically correct way to deal with
case sensitivity at least from a filesystem standpoint.

It does have some usability issues - exactly because of that "fixed at
filesystem creation time" - but since in *practice* nobody actually
cares about the odd cases, that isn't really much of a real issue.

And the fixed translation table means that it at least gets versioning
right, and you hopefully filled the table up sanely and don't end up
with the crazy cases (ie the nonprinting characters etc) so hopefully
it contains only the completely unambiguous stuff.

That said, I really suspect that in practice, folding even just the
7-bit ASCII subset would have been ok and would have obviated even
that table. And I say that as somebody who grew up in an environment
that used a bigger character set than that.

Of course, the NTFS stuff came about because FAT had code pages for
just the 8-bit cases - and people used them, and that then caused odd
issues when moving data around.

Again - 8-bit tables were entirely sufficient in practice but actually
caused more problems than not doing it at all would have. And then
people go "we switched to 16-bit wide characters, so we need to expand
on the code table too".

Which is obviously exactly how you end up with that 128kB table.

But you have to ask yourself: do you think that the people who made
the incredibly bad choice to use a fixed 16-bit wide character set -
which caused literally decades of trouble in Windows, and still shows
up today - then made the perfect choice when dealing with case
folding? Yeah, no.

Still, I very much agree it was a better choice than "let's call
random unicode routines we don't really appreciate the complexity of".

            Linus