[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <c276f6d9-38e0-0113-a134-bedd3f16298f@huawei.com>
Date: Sat, 24 Feb 2024 10:46:18 +0800
From: Baokun Li <libaokun1@...wei.com>
To: Jan Kara <jack@...e.cz>
CC: <linux-ext4@...r.kernel.org>, <tytso@....edu>, <adilger.kernel@...ger.ca>,
<ritesh.list@...il.com>, <linux-kernel@...r.kernel.org>,
<yi.zhang@...wei.com>, <yangerkun@...wei.com>, <chengzhihao1@...wei.com>,
<yukuai3@...wei.com>, <stable@...r.kernel.org>, Baokun Li
<libaokun1@...wei.com>
Subject: Re: [PATCH 4/7] ext4: add positive int attr pointer to avoid sysfs
variables overflow
On 2024/2/23 20:05, Jan Kara wrote:
> On Sat 17-02-24 15:41:43, Baokun Li wrote:
>> On 2024/2/14 0:58, Jan Kara wrote:
>>> On Fri 26-01-24 16:57:13, Baokun Li wrote:
>>>> We can easily trigger a BUG_ON by using the following commands:
>>>>
>>>> mount /dev/$disk /tmp/test
>>>> echo 2147483650 > /sys/fs/ext4/$disk/mb_group_prealloc
>>>> echo test > /tmp/test/file && sync
>>>>
>>>> ==================================================================
>>>> kernel BUG at fs/ext4/mballoc.c:2029!
>>>> invalid opcode: 0000 [#1] PREEMPT SMP PTI
>>>> CPU: 3 PID: 320 Comm: kworker/u36:1 Not tainted 6.8.0-rc1 #462
>>>> RIP: 0010:mb_mark_used+0x358/0x370
>>>> [...]
>>>> Call Trace:
>>>> ext4_mb_use_best_found+0x56/0x140
>>>> ext4_mb_complex_scan_group+0x196/0x2f0
>>>> ext4_mb_regular_allocator+0xa92/0xf00
>>>> ext4_mb_new_blocks+0x302/0xbc0
>>>> ext4_ext_map_blocks+0x95a/0xef0
>>>> ext4_map_blocks+0x2b1/0x680
>>>> ext4_do_writepages+0x733/0xbd0
>>>> [...]
>>>> ==================================================================
>>>>
>>>> In ext4_mb_normalize_group_request():
>>>> ac->ac_g_ex.fe_len = EXT4_SB(sb)->s_mb_group_prealloc;
>>>>
>>>> Here fe_len is of type int, but s_mb_group_prealloc is of type unsigned
>>>> int, so setting s_mb_group_prealloc to 2147483650 overflows fe_len to a
>>>> negative number, which ultimately triggers a BUG_ON() in mb_mark_used().
>>>>
>>>> Therefore, we add attr_pointer_pi (aka positive int attr pointer) with a
>>>> value range of 0-INT_MAX to avoid the above problem. In addition to the
>>>> mb_group_prealloc sysfs interface, the following interfaces also have uint
>>>> to int conversions that result in overflows, and are also fixed.
>>>>
>>>> err_ratelimit_burst
>>>> msg_ratelimit_burst
>>>> warning_ratelimit_burst
>>>> err_ratelimit_interval_ms
>>>> msg_ratelimit_interval_ms
>>>> warning_ratelimit_interval_ms
>>>> mb_best_avail_max_trim_order
>>>>
>>>> CC: stable@...r.kernel.org
>>>> Signed-off-by: Baokun Li <libaokun1@...wei.com>
>>> I don't think you need to change s_mb_group_prealloc here and then restrict
>>> it even further in the next patch. I'd just leave it alone here.
>> Yes, we could put the next patch before this one, but using
>> s_mb_group_prealloc as an example makes it easier to understand
>> why the attr_pointer_pi case is added here.There are several other
>> variables that don't have more convincing examples.
> Yes, I think reordering would be good. Because I've read the convertion and
> started wondering: "is this enough?"
Well, I will put the next patch before this one in the next version.
>>> Also I think that limiting mb_best_avail_max_trim_order to 64 instead of
>>> INT_MAX will make us more resilient to surprises in the future :) But I
>>> don't really insist.
>>>
>>> Honza
>> I think it's enough here to make sure that mb_best_avail_max_trim_order
>> is a positive number, since we always make sure that min_order
>> is not less than 0, as follows:
>>
>> order = fls(ac->ac_g_ex.fe_len) - 1;
>> min_order = order - sbi->s_mb_best_avail_max_trim_order;
>> if (min_order < 0)
>> min_order = 0;
>>
>> An oversized mb_best_avail_max_trim_order can be interpreted as
>> always being CR_ANY_FREE. 😄
> Well, s_mb_best_avail_max_trim_order is not about allocation passes but
> about how many times are we willing to shorten the goal extent to half and
> still use the advanced free blocks search.
Yes, this means that in CR1.5, in case the original request is satisfied,
we allow allocation of blocks with an order of (goal_extent_order -
s_mb_best_avail_max_trim_order) to accelerate block allocation.
> And I agree that the mballoc
> code is careful enough that large numbers don't matter there but still why
> allowing storing garbage values? It is nicer to tell sysadmin he did
> something wrong right away.
>
> Honza
Yes, we shouldn't allow storing rubbish values, otherwise it may
mislead admins, I will add an extra type to check it.
Thanks!
--
With Best Regards,
Baokun Li
.
Powered by blists - more mailing lists