[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <42a3bda8-bbec-4991-a96e-303636d7bbd1@froggi.es>
Date: Mon, 28 Apr 2025 03:56:21 +0100
From: Autumn Ashton <misyl@...ggi.es>
To: Kent Overstreet <kent.overstreet@...ux.dev>
Cc: Eric Biggers <ebiggers@...nel.org>, Matthew Wilcox <willy@...radead.org>,
Theodore Ts'o <tytso@....edu>, Linus Torvalds
<torvalds@...ux-foundation.org>, linux-bcachefs@...r.kernel.org,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [GIT PULL] bcachefs fixes for 6.15-rc4
On 4/28/25 3:16 AM, Kent Overstreet wrote:
> On Mon, Apr 28, 2025 at 03:05:19AM +0100, Autumn Ashton wrote:
>>
>>
>> On 4/28/25 2:43 AM, Kent Overstreet wrote:
>>> On Sun, Apr 27, 2025 at 06:30:59PM -0700, Eric Biggers wrote:
>>>> On Sun, Apr 27, 2025 at 08:55:30PM -0400, Kent Overstreet wrote:
>>>>> The thing is, that's exactly what we're doing. ext4 and bcachefs both
>>>>> refer to a specific revision of the folding rules: for ext4 it's
>>>>> specified in the superblock, for bcachefs it's hardcoded for the moment.
>>>>>
>>>>> I don't think this is the ideal approach, though.
>>>>>
>>>>> That means the folding rules are "whatever you got when you mkfs'd".
>>>>> Think about what that means if you've got a fleet of machines, of
>>>>> different ages, but all updated in sync: that's a really annoying way
>>>>> for gremlins of the "why does this machine act differently" variety to
>>>>> creep in.
>>>>>
>>>>> What I'd prefer is for the unicode folding rules to be transparently and
>>>>> automatically updated when the kernel is updated, so that behaviour
>>>>> stays in sync. That would behave more the way users would expect.
>>>>>
>>>>> But I only gave this real thought just over the past few days, and doing
>>>>> this safely and correctly would require some fairly significant changes
>>>>> to the way casefolding works.
>>>>>
>>>>> We'd have to ensure that lookups via the case sensitive name always
>>>>> works, even if the casefolding table the dirent was created with give
>>>>> different results that the currently active casefolding table.
>>>>>
>>>>> That would require storing two different "dirents" for each real dirent,
>>>>> one normalized and one un-normalized, because we'd have to do an
>>>>> un-normalized lookup if the normalized lookup fails (and vice versa).
>>>>> Which should be completely fine from a performance POV, assuming we have
>>>>> working negative dentries.
>>>>>
>>>>> But, if the unicode folding rules are stable enough (and one would hope
>>>>> they are), hopefully all this is a non-issue.
>>>>>
>>>>> I'd have to gather more input from users of casefolding on other
>>>>> filesystems before saying what our long term plans (if any) will be.
>>>>
>>>> Wouldn't lookups via the case-sensitive name keep working even if the
>>>> case-insensitivity rules change? It's lookups via a case-insensitive name that
>>>> could start producing different results. Applications can depend on
>>>> case-insensitive lookups being done in a certain way, so changing the
>>>> case-insensitivity rules can be risky.
>>>
>>> No, because right now on a case-insensitive filesystem we _only_ do the
>>> lookup with the normalized name.
>>>
>>>> Regardless, the long-term plan for the case-insensitivity rules should be to
>>>> deprecate the current set of rules, which does Unicode normalization which is
>>>> way overkill. It should be replaced with a simple version of case-insensitivity
>>>> that matches what FAT does. And *possibly* also a version that matches what
>>>> NTFS does (a u16 upcase_table[65536] indexed by UTF-16 coding units), if someone
>>>> really needs that.
>>>>
>>>> As far as I know, that was all that was really needed in the first place.
>>>>
>>>> People misunderstood the problem as being about language support, rather than
>>>> about compatibility with legacy filesystems. And as a result they incorrectly
>>>> decided they should do Unicode normalization, which is way too complex and has
>>>> all sorts of weird properties.
>>>
>>> Believe me, I do see the appeal of that.
>>>
>>> One of the things I should really float with e.g. Valve is the
>>> possibility of providing tooling/auditing to make it easy to fix
>>> userspace code that's doing lookups that only work with casefolding.
>>
>> This is not really about fixing userspace code that expects casefolding, or
>> providing some form of stopgap there.
>>
>> The main need there is Proton/Wine, which is a compat layer for Windows
>> apps, which needs to pretend it's on NTFS and everything there expects
>> casefolding to work.
>>
>> No auditing/tooling required, we know the problem. It is unavoidable.
>
> Does this boil all the way up to e.g. savegames?
Everything, assets, save games.
You can't just patch the games... Doing that for every game on Steam
with every way they load games would be impossible, especially with
modern day obfuscated binaries, and anti-cheat and anti-tamper solutions.
- Autumn ✨
>
> I was imagining predetermined assets, where the name of the file would
> be present in a compiled binary, and it's little more than a search and
> replace. But would only work if it's present as a string literal.
>
>> I agree with the calling about Unicode normalization being odd though, when
>> I was implementing casefolding for bcachefs, I immediately thought it was a
>> huge hammer to do full normalization for the intended purpose, and not just
>> a big table...
>
> Samba's historically wanted casefolding, and Windows casefolding is
> Unicode (and it's full, not simple - mostly), so I'd expect that was the
> other main driver.
>
> I'm sure there's other odd corners besides just Samba where Windows
> compatibility comes up, people cook up all kinds of strange things.
Powered by blists - more mailing lists