linux-kernel - Re: [PATCH v2 1/2] md: Fix overflow in is_mddev

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <be8d9147-4f7f-2fab-da2a-bb4cde46fd12@huaweicloud.com>
Date: Sat, 16 Dec 2023 10:24:20 +0800
From: Yu Kuai <yukuai1@...weicloud.com>
To: Song Liu <song@...nel.org>, linan666@...weicloud.com
Cc: axboe@...nel.dk, linux-raid@...r.kernel.org,
 linux-kernel@...r.kernel.org, linux-block@...r.kernel.org,
 yi.zhang@...wei.com, houtao1@...wei.com, yangerkun@...wei.com,
 "yukuai (C)" <yukuai3@...wei.com>
Subject: Re: [PATCH v2 1/2] md: Fix overflow in is_mddev_idle

Hi,

在 2023/12/16 7:12, Song Liu 写道:
> On Thu, Dec 14, 2023 at 5:41 PM <linan666@...weicloud.com> wrote:
>>
>> From: Li Nan <linan122@...wei.com>
>>
>> UBSAN reports this problem:
>>
>>    UBSAN: Undefined behaviour in drivers/md/md.c:8175:15
>>    signed integer overflow:
>>    -2147483291 - 2072033152 cannot be represented in type 'int'
>>    Call trace:
>>     dump_backtrace+0x0/0x310
>>     show_stack+0x28/0x38
>>     dump_stack+0xec/0x15c
>>     ubsan_epilogue+0x18/0x84
>>     handle_overflow+0x14c/0x19c
>>     __ubsan_handle_sub_overflow+0x34/0x44
>>     is_mddev_idle+0x338/0x3d8
>>     md_do_sync+0x1bb8/0x1cf8
>>     md_thread+0x220/0x288
>>     kthread+0x1d8/0x1e0
>>     ret_from_fork+0x10/0x18
>>
>> 'curr_events' will overflow when stat accum or 'sync_io' is greater than
>> INT_MAX.
>>
>> Fix it by changing sync_io, last_events and curr_events to 64bit.
>>
>> Signed-off-by: Li Nan <linan122@...wei.com>
>> ---
>>   drivers/md/md.h        | 4 ++--
>>   include/linux/blkdev.h | 2 +-
>>   drivers/md/md.c        | 7 ++++---
>>   3 files changed, 7 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/md/md.h b/drivers/md/md.h
>> index ade83af123a2..1a4f976951c1 100644
>> --- a/drivers/md/md.h
>> +++ b/drivers/md/md.h
>> @@ -50,7 +50,7 @@ struct md_rdev {
>>
>>          sector_t sectors;               /* Device size (in 512bytes sectors) */
>>          struct mddev *mddev;            /* RAID array if running */
>> -       int last_events;                /* IO event timestamp */
>> +       long long last_events;          /* IO event timestamp */
>>
>>          /*
>>           * If meta_bdev is non-NULL, it means that a separate device is
>> @@ -584,7 +584,7 @@ extern void mddev_unlock(struct mddev *mddev);
>>
>>   static inline void md_sync_acct(struct block_device *bdev, unsigned long nr_sectors)
>>   {
>> -       atomic_add(nr_sectors, &bdev->bd_disk->sync_io);
>> +       atomic64_add(nr_sectors, &bdev->bd_disk->sync_io);
>>   }
>>
>>   static inline void md_sync_acct_bio(struct bio *bio, unsigned long nr_sectors)
>> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
>> index 3f8a21cd9233..d28b98adf457 100644
>> --- a/include/linux/blkdev.h
>> +++ b/include/linux/blkdev.h
>> @@ -170,7 +170,7 @@ struct gendisk {
>>          struct list_head slave_bdevs;
>>   #endif
>>          struct timer_rand_state *random;
>> -       atomic_t sync_io;               /* RAID */
>> +       atomic64_t sync_io;             /* RAID */
>>          struct disk_events *ev;
> 
> As we are on this, I wonder whether we really need this.
> AFAICT, is_mddev_idle() is the only consumer of sync_io.
> We can probably do the same check in is_mddev_idle()
> without sync_io.

After reviewing some code, what I'm understand is following:

I think 'sync_io' is used to distinguish 'sync io' from raid(this can
from different raid, for example, different partition is used for
different array) and other io(any io, even it's not from raid). And
if there are any other IO, sync speed is limited to min, otherwise it's
limited to max.

If we want to keep this behaviour, I can't think of any other way for
now...

> 
> Thanks,
> Song
> 
> 
>>
>>   #ifdef CONFIG_BLK_DEV_ZONED
>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>> index c94373d64f2c..1d71b2a9af03 100644
>> --- a/drivers/md/md.c
>> +++ b/drivers/md/md.c
>> @@ -8496,14 +8496,15 @@ static int is_mddev_idle(struct mddev *mddev, int init)
>>   {
>>          struct md_rdev *rdev;
>>          int idle;
>> -       int curr_events;
>> +       long long curr_events;
>>
>>          idle = 1;
>>          rcu_read_lock();
>>          rdev_for_each_rcu(rdev, mddev) {
>>                  struct gendisk *disk = rdev->bdev->bd_disk;
>> -               curr_events = (int)part_stat_read_accum(disk->part0, sectors) -
>> -                             atomic_read(&disk->sync_io);
>> +               curr_events =
>> +                       (long long)part_stat_read_accum(disk->part0, sectors) -
>> +                             atomic64_read(&disk->sync_io);

By the way, I don't think this overflow is problematic, assume that
sectors accumulate by 'A' and sync_io accumulate by 'B':

(setros + A) - (sync_io + B) -(sectors - sync_io) = (A - B)

Nomatter overflow or truncation happened of not in the abouve addition
and subtraction, the result is correct.

And even if io accounting is disabled, which means sectors and A is
always 0, the result will always be -B that is <= 0, hence
is_mddev_idle() will always return true, and sync_speed will be limited
to max in this case.

Thanks,
Kuai

>>                  /* sync IO will cause sync_io to increase before the disk_stats
>>                   * as sync_io is counted when a request starts, and
>>                   * disk_stats is counted when it completes.
>> --
>> 2.39.2
>>
>>
> .
>