linux-kernel - Re: Unicode conversion issue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87cyhyuhow.fsf@mailhost.krisman.be>
Date: Wed, 11 Dec 2024 14:45:51 -0500
From: Gabriel Krisman Bertazi <krisman@...e.de>
To: Jaegeuk Kim <jaegeuk@...nel.org>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,  Linux Kernel Mailing
 List <linux-kernel@...r.kernel.org>,  "hanqi@...o.com" <hanqi@...o.com>,
 "Theodore Ts'o" <tytso@....edu>
Subject: Re: Unicode conversion issue

Jaegeuk Kim <jaegeuk@...nel.org> writes:

> On 12/11, Gabriel Krisman Bertazi wrote:
>> Jaegeuk Kim <jaegeuk@...nel.org> writes:
>> 
>> > Hi Linus/Gabriel,
>> >
>> > Once Android applied the below patch [1], some special characters started to be
>> > converted differently resulting in different length, so that f2fs cannot find
>> > the filename correctly which was created when the kernel didn't have [1].
>> >
>> > There is one bug report in [2] where describes more details. In order to avoid
>> > this, could you please consider reverting [1] asap? Or, is there any other
>> > way to keep the conversion while addressing CVE? It's very hard for f2fs to
>> > distinguish two valid converted lengths before/after [1].
>> 
>> I got this report yesterday. I'm looking into it.
>> 
>> It seems commit 5c26d2f1d3f5 ("unicode: Don't special case ignorable
>> code points") has affected more than ignorable code points, because that
>> U+2764 is not marked as Ignorable in the unicode database.
>> 
>> I still think the solution to the original issue is eliminating
>> ignorable code points, and that should be fine.  Let me look at why this
>> block of characters is mishandled.

I was struggling to reproduce it, until I copy-pasted the character
directly from the bugzilla:

The character the user has is ❤️, which is different than just ❤.  This
is a combination of:

U+2764 + U+FE0F  (Heavy Black Heart + Variation Selector-16)

Variation Selector-16 is an ignorable character with zero length,
exactly what we wanted to ignore with that patch.  What I didn't
consider in the original submission was that, differently from other
ignorable code-points, this block might be used intentionally in a filename.

> Thank you so much. If it takes some time to find the root cause, may I
> propose the revert first to unblock production? The problem is quite severe
> as users cannot access their files.

We have 3 ways forward.

1) The first is to revert the patch and fix the original issue in a
different way.  That would be: We would restore the original database
and treat Ignorable codepoints as folding to themselves only when doing
string comparisons, but not when calculating hashes.  This way, the hash
will be the same, but filenames with Ignorable codepoints will be
handled as byte sequences.

2) We keep the original patch and add support in fsck to update the
hashes in volumes like the above.

3) We regenerate the database to Ignore codepoints in the code-block
FE00..FE0F.  That would be the simplest, solution, but there might be
more cases that need fixing later.

At this point, I'd be pending torwards 1 or 3.  Both of them can be done
after reverting my original patch, so I'm fine with that.  Thoughts?

> Thank you so much. If it takes some time to find the root cause, may I
> propose the revert first to unblock production? The problem is quite
> severe as users cannot access their files.

I don't oppose this, considering the case at hand.  I'll base the new patch
on top of the revert.

-- 
Gabriel Krisman Bertazi