[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <65d045bc-45d2-9e02-f0f7-7a0f51504eb9@suse.de>
Date: Fri, 10 Dec 2021 12:57:31 +0100
From: Hannes Reinecke <hare@...e.de>
To: Li Jinlin <lijinlin3@...wei.com>, song@...nel.org,
philipp.reisner@...bit.com, lars.ellenberg@...bit.com,
axboe@...nel.dk, jack@...e.cz, ming.lei@...hat.com, tj@...nel.org,
mcgrof@...nel.org
Cc: linux-raid@...r.kernel.org, linux-block@...r.kernel.org,
linux-kernel@...r.kernel.org, linfeilong@...wei.com
Subject: Re: [PATCH v3 1/3] md: Fix undefined behaviour in is_mddev_idle
On 12/10/21 1:06 PM, Li Jinlin wrote:
> UBSAN reports this problem:
>
> [ 5984.281385] UBSAN: Undefined behaviour in drivers/md/md.c:8175:15
> [ 5984.281390] signed integer overflow:
> [ 5984.281393] -2147483291 - 2072033152 cannot be represented in type 'int'
> [ 5984.281400] CPU: 25 PID: 1854 Comm: md101_resync Kdump: loaded Not tainted 4.19.90
> [ 5984.281404] Hardware name: Huawei TaiShan 200 (Model 5280)/BC82AMDDA
> [ 5984.281406] Call trace:
> [ 5984.281415] dump_backtrace+0x0/0x310
> [ 5984.281418] show_stack+0x28/0x38
> [ 5984.281425] dump_stack+0xec/0x15c
> [ 5984.281430] ubsan_epilogue+0x18/0x84
> [ 5984.281434] handle_overflow+0x14c/0x19c
> [ 5984.281439] __ubsan_handle_sub_overflow+0x34/0x44
> [ 5984.281445] is_mddev_idle+0x338/0x3d8
> [ 5984.281449] md_do_sync+0x1bb8/0x1cf8
> [ 5984.281452] md_thread+0x220/0x288
> [ 5984.281457] kthread+0x1d8/0x1e0
> [ 5984.281461] ret_from_fork+0x10/0x18
>
> When the stat aacum of the disk is greater than INT_MAX, its
> value becomes negative after casting to 'int', which may lead
> to overflow after subtracting a positive number. In the same
> way, when the value of sync_io is greater than INT_MAX,
> overflow may also occur. These situations will lead to
> undefined behavior.
>
> Otherwise, if the stat accum of the disk is close to INT_MAX
> when creating raid arrays, the initial value of last_events
> would be set close to INT_MAX when mddev initializes IO
> event counters. 'curr_events - rdev->last_events > 64' will
> always false during synchronization. If all the disks of mddev
> are in this case, is_mddev_idle() will always return 1, which
> may cause non-sync IO is very slow.
>
> Fix by using atomic64_t type for sync_io, and using s64 type
> for curr_events/last_events.
>
> Signed-off-by: Li Jinlin <lijinlin3@...wei.com>
> ---
> drivers/md/md.c | 6 +++---
> drivers/md/md.h | 4 ++--
> include/linux/genhd.h | 2 +-
> 3 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 5111ed966947..be73a5ae6864 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -8429,14 +8429,14 @@ static int is_mddev_idle(struct mddev *mddev, int init)
> {
> struct md_rdev *rdev;
> int idle;
> - int curr_events;
> + s64 curr_events;
>
> idle = 1;
> rcu_read_lock();
> rdev_for_each_rcu(rdev, mddev) {
> struct gendisk *disk = rdev->bdev->bd_disk;
> - curr_events = (int)part_stat_read_accum(disk->part0, sectors) -
> - atomic_read(&disk->sync_io);
> + curr_events = (s64)part_stat_read_accum(disk->part0, sectors) -
> + atomic64_read(&disk->sync_io);
> /* sync IO will cause sync_io to increase before the disk_stats
> * as sync_io is counted when a request starts, and
> * disk_stats is counted when it completes.
> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index 53ea7a6961de..e00d6730da13 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -50,7 +50,7 @@ struct md_rdev {
>
> sector_t sectors; /* Device size (in 512bytes sectors) */
> struct mddev *mddev; /* RAID array if running */
> - int last_events; /* IO event timestamp */
> + s64 last_events; /* IO event timestamp */
>
> /*
> * If meta_bdev is non-NULL, it means that a separate device is
> @@ -551,7 +551,7 @@ extern void mddev_unlock(struct mddev *mddev);
>
> static inline void md_sync_acct(struct block_device *bdev, unsigned long nr_sectors)
> {
> - atomic_add(nr_sectors, &bdev->bd_disk->sync_io);
> + atomic64_add(nr_sectors, &bdev->bd_disk->sync_io);
> }
>
> static inline void md_sync_acct_bio(struct bio *bio, unsigned long nr_sectors)
> diff --git a/include/linux/genhd.h b/include/linux/genhd.h
> index 74c410263113..efa7884de11b 100644
> --- a/include/linux/genhd.h
> +++ b/include/linux/genhd.h
> @@ -150,7 +150,7 @@ struct gendisk {
> struct list_head slave_bdevs;
> #endif
> struct timer_rand_state *random;
> - atomic_t sync_io; /* RAID */
> + atomic64_t sync_io; /* RAID */
> struct disk_events *ev;
> #ifdef CONFIG_BLK_DEV_INTEGRITY
> struct kobject integrity_kobj;
>
You haven't answered my question.
This patch has exactly the same problem than the original, only shifted
to LONG_MAX instead of INT_MAX.
Have you considered decreasing 'sync_io' in the endio handler, and then
just using the 'sync_io' value to figure out if sync_io is active?
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@...e.de +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer
Powered by blists - more mailing lists