[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <D8CDCA7B-BF0B-4095-BA69-AEEA4C56B7CC@zytor.com>
Date: Wed, 30 Apr 2025 21:51:47 -0700
From: "H. Peter Anvin" <hpa@...or.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
CC: "Theodore Ts'o" <tytso@....edu>,
Kent Overstreet <kent.overstreet@...ux.dev>,
linux-bcachefs@...r.kernel.org, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [GIT PULL] bcachefs fixes for 6.15-rc4
On April 30, 2025 8:12:20 PM PDT, Linus Torvalds <torvalds@...ux-foundation.org> wrote:
>On Wed, 30 Apr 2025 at 19:48, H. Peter Anvin <hpa@...or.com> wrote:
>>
>> It is worth noting that Microsoft has basically declared their
>> "recommended" case folding (upcase) table to be permanently frozen (for
>> new filesystem instances in the case where they use an on-disk
>> translation table created at format time.) As far as I know they have
>> never supported anything other than 1:1 conversion of BMP code points,
>> nor normalization.
>
>So no crazy 'ß' matches 'ss' kind of thing? (And yes, afaik that's
>technically wrong even in German, but afaik at least sorts the same in
>some locales).
>
>Because yes, if MS basically does a 1:1 unicode translation with a
>fixed table, that is not only "simpler", I think it's what we should
>strive for.
>
>Because I think the *only* valid reason for case insensitive
>filesystems is "backwards compatibility", and given that, it's
>_particularly_ stupid to then do anything more complicated and broken
>than the thing you're trying to be compatible with.
>
>I hope to everything holy that nobody ever wants to be compatible with
>the absolute garbage that is the OSX HFS model.
>
>Because the whole "let's actively corrupt names into something that is
>almost, but not exactly, NFD" stuff is just some next-level evil
>stuff.
>
> Linus
>
Yeah, collation order is highly localized, and had never made the assumption of being expected to be used as a unique lookup key.
It's also completely inconsistent, even between neighboring locales, like how in Swedish and Finnish Å sorts before Ä/Æ and Ö/Ø whereas as Danish and Norwegian sort Å after; even though everyone agrees Ä/Æ and Ö/Ø are the same letter and are to be sorted the same. Icelandic doesn't have Å but sorts it as Danish after Ä and Ø(!) but stuffs Þ after Z instead of with or after T. Swedish, and I believe the other Nordic languages, sort Ü as Y, but in German Ä, Ö, and Ü are sorted as A, O and U.
Powered by blists - more mailing lists