linux-kernel - Re: [PATCH 6.6 00/28] fix CVE-2024-46701

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <976C0DD5-4337-4C7D-92C6-A38C2EC335A4@oracle.com>
Date: Sat, 9 Nov 2024 16:58:38 +0000
From: Chuck Lever III <chuck.lever@...cle.com>
To: Yu Kuai <yukuai1@...weicloud.com>, Al Viro <viro@...iv.linux.org.uk>,
        Christian Brauner <brauner@...nel.org>
CC: Greg KH <gregkh@...uxfoundation.org>,
        linux-stable
	<stable@...r.kernel.org>,
        "harry.wentland@....com" <harry.wentland@....com>,
        "sunpeng.li@....com" <sunpeng.li@....com>,
        "Rodrigo.Siqueira@....com"
	<Rodrigo.Siqueira@....com>,
        "alexander.deucher@....com"
	<alexander.deucher@....com>,
        "christian.koenig@....com"
	<christian.koenig@....com>,
        "Xinhui.Pan@....com" <Xinhui.Pan@....com>,
        "airlied@...il.com" <airlied@...il.com>,
        Daniel Vetter <daniel@...ll.ch>,
        Liam Howlett <liam.howlett@...cle.com>,
        Andrew Morton
	<akpm@...ux-foundation.org>,
        Hugh Dickins <hughd@...gle.com>,
        "Matthew Wilcox
 (Oracle)" <willy@...radead.org>,
        Sasha Levin <sashal@...nel.org>,
        "srinivasan.shanmugam@....com" <srinivasan.shanmugam@....com>,
        "chiahsuan.chung@....com" <chiahsuan.chung@....com>,
        "mingo@...nel.org"
	<mingo@...nel.org>,
        "mgorman@...hsingularity.net"
	<mgorman@...hsingularity.net>,
        "chengming.zhou@...ux.dev"
	<chengming.zhou@...ux.dev>,
        "zhangpeng.00@...edance.com"
	<zhangpeng.00@...edance.com>,
        "amd-gfx@...ts.freedesktop.org"
	<amd-gfx@...ts.freedesktop.org>,
        "dri-devel@...ts.freedesktop.org"
	<dri-devel@...ts.freedesktop.org>,
        Linux Kernel Mailing List
	<linux-kernel@...r.kernel.org>,
        Linux FS Devel
	<linux-fsdevel@...r.kernel.org>,
        "maple-tree@...ts.infradead.org"
	<maple-tree@...ts.infradead.org>,
        linux-mm <linux-mm@...ck.org>,
        "yi.zhang@...wei.com" <yi.zhang@...wei.com>,
        yangerkun
	<yangerkun@...wei.com>, "yukuai (C)" <yukuai3@...wei.com>
Subject: Re: [PATCH 6.6 00/28] fix CVE-2024-46701



> On Nov 8, 2024, at 8:30 PM, Yu Kuai <yukuai1@...weicloud.com> wrote:
> 
> Hi,
> 
> 在 2024/11/08 21:23, Chuck Lever III 写道:
>>> On Nov 7, 2024, at 8:19 PM, Yu Kuai <yukuai1@...weicloud.com> wrote:
>>> 
>>> Hi,
>>> 
>>> 在 2024/11/07 22:41, Chuck Lever 写道:
>>>> On Thu, Nov 07, 2024 at 08:57:23AM +0800, Yu Kuai wrote:
>>>>> Hi,
>>>>> 
>>>>> 在 2024/11/06 23:19, Chuck Lever III 写道:
>>>>>> 
>>>>>> 
>>>>>>> On Nov 6, 2024, at 1:16 AM, Greg KH <gregkh@...uxfoundation.org> wrote:
>>>>>>> 
>>>>>>> On Thu, Oct 24, 2024 at 09:19:41PM +0800, Yu Kuai wrote:
>>>>>>>> From: Yu Kuai <yukuai3@...wei.com>
>>>>>>>> 
>>>>>>>> Fix patch is patch 27, relied patches are from:
>>>>>> 
>>>>>> I assume patch 27 is:
>>>>>> 
>>>>>> libfs: fix infinite directory reads for offset dir
>>>>>> 
>>>>>> https://lore.kernel.org/stable/20241024132225.2271667-12-yukuai1@huaweicloud.com/
>>>>>> 
>>>>>> I don't think the Maple tree patches are a hard
>>>>>> requirement for this fix. And note that libfs did
>>>>>> not use Maple tree originally because I was told
>>>>>> at that time that Maple tree was not yet mature.
>>>>>> 
>>>>>> So, a better approach might be to fit the fix
>>>>>> onto linux-6.6.y while sticking with xarray.
>>>>> 
>>>>> The painful part is that using xarray is not acceptable, the offet
>>>>> is just 32 bit and if it overflows, readdir will read nothing. That's
>>>>> why maple_tree has to be used.
>>>> A 32-bit range should be entirely adequate for this usage.
>>>>  - The offset allocator wraps when it reaches the maximum, it
>>>>    doesn't overflow unless there are actually billions of extant
>>>>    entries in the directory, which IMO is not likely.
>>> 
>>> Yes, it's not likely, but it's possible, and not hard to trigger for
>>> test.
>> I question whether such a test reflects any real-world
>> workload.
>> Besides, there are a number of other limits that will impact
>> the ability to create that many entries in one directory.
>> The number of inodes in one tmpfs instance is limited, for
>> instance.
>>> And please notice that the offset will increase for each new file,
>>> and file can be removed, while offset stays the same.
> 
> Did you see the above explanation? files can be removed, you don't have
> to store that much files to trigger the offset to overflow.
>>>>  - The offset values are dense, so the directory can use all 2- or
>>>>    4- billion in the 32-bit integer range before wrapping.
>>> 
>>> A simple math, if user create and remove 1 file in each seconds, it will
>>> cost about 130 years to overflow. And if user create and remove 1000
>>> files in each second, it will cost about 1 month to overflow.

> The problem is that if the next_offset overflows to 0, then after patch
> 27, offset_dir_open() will record the 0, and later offset_readdir will
> return directly, while there can be many files.


Let me revisit this for a moment. The xa_alloc_cyclic() call
in simple_offset_add() has a range limit argument of 2 - U32_MAX.

So I'm not clear how an overflow (or, more precisely, the
reuse of an offset value) would result in a "0" offset being
recorded. The range limit prevents the use of 0 and 1.

A "0" offset value would be a bug, I agree, but I don't see
how that can happen.


>> The question is what happens when there are no more offset
>> values available. xa_alloc_cyclic should fail, and file
>> creation is supposed to fail at that point. If it doesn't,
>> that's a bug that is outside of the use of xarray or Maple.
> 
> Can you show me the code that xa_alloc_cyclic should fail? At least
> according to the commets, it will return 1 if the allocation succeeded
> after wrapping.
> 
> * Context: Any context.  Takes and releases the xa_lock.  May sleep if
> * the @gfp flags permit.
> * Return: 0 if the allocation succeeded without wrapping.  1 if the
> * allocation succeeded after wrapping, -ENOMEM if memory could not be
> * allocated or -EBUSY if there are no free entries in @limit.
> */
> static inline int xa_alloc_cyclic(struct xarray *xa, u32 *id, void *entry,
> struct xa_limit limit, u32 *next, gfp_t gfp)

I recall (dimly) that directory entry offset value re-use
is acceptable and preferred, so I think ignoring a "1"
return value from xa_alloc_cyclic() is OK. If there are
no unused offset values available, it will return -EBUSY,
and file creation will fail.

Perhaps Christian or Al can chime in here on whether
directory entry offset value re-use is indeed expected
to be acceptable.

Further, my understanding is that:

https://lore.kernel.org/stable/20241024132225.2271667-12-yukuai1@huaweicloud.com/

fixes a rename issue that results in an infinite loop,
and that's the (only) issue that underlies CVE-2024-46701.

You are suggesting that there are other overflow problems
with the xarray-based simple_offset implementation. If I
can confirm them, then I can get these fixed in v6.6. But
so far, I'm not sure I completely understand these other
failure modes.

Are you suggesting that the above fix /introduces/ the
0 offset problem?

--
Chuck Lever