linux-kernel - Re: [GIT PULL] bcachefs fixes for 6.15-rc4

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <yarkxhxub75z3vj47cidpe4vfk5b6cdx5mip2ummgyi6v6z4eg@rnfiud3fonxs>
Date: Sun, 27 Apr 2025 23:01:20 -0400
From: Kent Overstreet <kent.overstreet@...ux.dev>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Eric Biggers <ebiggers@...nel.org>, Autumn Ashton <misyl@...ggi.es>, 
	Matthew Wilcox <willy@...radead.org>, Theodore Ts'o <tytso@....edu>, linux-bcachefs@...r.kernel.org, 
	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [GIT PULL] bcachefs fixes for 6.15-rc4

On Sun, Apr 27, 2025 at 07:39:46PM -0700, Linus Torvalds wrote:
> On Sun, 27 Apr 2025 at 19:22, Eric Biggers <ebiggers@...nel.org> wrote:
> >
> > I suspect that all that was really needed was case-insensitivity of ASCII a-z.
> 
> Yes. That's my argument. I think anything else ends up being a
> mistake. MAYBE extend it to the first 256 characters in Unicode (aka
> "Latin1").
> 
> Case folding on a-z is the only thing you could really effectively
> rely on in user space even in the DOS times, because different
> codepages would make for different rules for the upper 128 characters
> anyway, and you could be in a situation where you literally couldn't
> copy files from one floppy to another, because two files that had
> distinct names on one floppy would have the *same* name on another
> one.
> 
> Of course, that was mostly a weird corner case that almost nobody ever
> actually saw in practice, because very few people even used anything
> else than the default codepage.
> 
> And the same is afaik still true on NT, although practically speaking
> I suspect it went from "unusual" to "really doesn't happen EVER in
> practice".

I'm having trouble finding anything authoritative, but what I'm seeing
indicates that NTFS does do Unicode casefolding (and their own
incompatible version, at that).

> Extending those mistakes to full unicode and mixing in things like
> nonprinting codes and other things have only made things worse.
> 
> And dealing with things like ß and ss and trying to make those compare
> as equal is a *horrible* mistake. People who really need to do that
> (usually for some legalistic local reason) tend to have very specific
> rules for sorting anyway, and they are rules specific to particular
> situations, not something that the filesystem should even try to work
> with.

Well, casefolding is something that's directly exposed to users. So I do
think that if casefolding is going to exist at all, there is a strong
argument for it to be unicode and handling things like ß to ss.

(Can you imagine being the user that gets used to typing in filenames
and ignoring capitalization, except whenever an accented letter is part
of the filename, and then your muscle-memeory breaks? That sort of thing
is maddening).

BUT:

I'm becoming more and more convinced that I want more separation between
casefolded lookups and non casefolded lookups, the potential for
casefolding rule changes to break case-sensitive lookups is just bad.

If we do a "casefolding version 2" in bcachefs, we'll just have a
separate btree for casefolded dirents, and casefolded directories will
have their dirents indexed twice.

That's trivially extensible to multiple versions if - god forbid - we
ever end up needing to support multiple "locales", and more importantly
it'd let us support a mode where it's only certain pids that get
casefolded lookups, so you don't e.g. get casefolding dependencies
creeping into your makefiles as can happen today.