[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.22.394.2105200927570.1771368@ramsan.of.borg>
Date: Thu, 20 May 2021 09:43:10 +0200 (CEST)
From: Geert Uytterhoeven <geert@...ux-m68k.org>
To: David Sterba <dsterba@...e.com>
cc: linux-btrfs@...r.kernel.org, linux-kernel@...r.kernel.org,
Arnd Bergmann <arnd@...db.de>
Subject: Re: [PATCH] btrfs: scrub: per-device bandwidth control
Hi David,
On Tue, 18 May 2021, David Sterba wrote:
> Add sysfs interface to limit io during scrub. We relied on the ionice
> interface to do that, eg. the idle class let the system usable while
> scrub was running. This has changed when mq-deadline got widespread and
> did not implement the scheduling classes. That was a CFQ thing that got
> deleted. We've got numerous complaints from users about degraded
> performance.
>
> Currently only BFQ supports that but it's not a common scheduler and we
> can't ask everybody to switch to it.
>
> Alternatively the cgroup io limiting can be used but that also a
> non-trivial setup (v2 required, the controller must be enabled on the
> system). This can still be used if desired.
>
> Other ideas that have been explored: piggy-back on ionice (that is set
> per-process and is accessible) and interpret the class and classdata as
> bandwidth limits, but this does not have enough flexibility as there are
> only 8 allowed and we'd have to map fixed limits to each value. Also
> adjusting the value would need to lookup the process that currently runs
> scrub on the given device, and the value is not sticky so would have to
> be adjusted each time scrub runs.
>
> Running out of options, sysfs does not look that bad:
>
> - it's accessible from scripts, or udev rules
> - the name is similar to what MD-RAID has
> (/proc/sys/dev/raid/speed_limit_max or /sys/block/mdX/md/sync_speed_max)
> - the value is sticky at least for filesystem mount time
> - adjusting the value has immediate effect
> - sysfs is available in constrained environments (eg. system rescue)
> - the limit also applies to device replace
>
> Sysfs:
>
> - raw value is in bytes
> - values written to the file accept suffixes like K, M
> - file is in the per-device directory /sys/fs/btrfs/FSID/devinfo/DEVID/scrub_speed_max
> - 0 means use default priority of IO
>
> The scheduler is a simple deadline one and the accuracy is up to nearest
> 128K.
>
> Signed-off-by: David Sterba <dsterba@...e.com>
Thanks for your patch, which is now commit b4a9f4bee31449bc ("btrfs:
scrub: per-device bandwidth control") in linux-next.
noreply@...erman.id.au reported the following failures for e.g.
m68k/defconfig:
ERROR: modpost: "__udivdi3" [fs/btrfs/btrfs.ko] undefined!
ERROR: modpost: "__divdi3" [fs/btrfs/btrfs.ko] undefined!
> --- a/fs/btrfs/scrub.c
> +++ b/fs/btrfs/scrub.c
> @@ -1988,6 +1993,60 @@ static void scrub_page_put(struct scrub_page *spage)
> }
> }
>
> +/*
> + * Throttling of IO submission, bandwidth-limit based, the timeslice is 1
> + * second. Limit can be set via /sys/fs/UUID/devinfo/devid/scrub_speed_max.
> + */
> +static void scrub_throttle(struct scrub_ctx *sctx)
> +{
> + const int time_slice = 1000;
> + struct scrub_bio *sbio;
> + struct btrfs_device *device;
> + s64 delta;
> + ktime_t now;
> + u32 div;
> + u64 bwlimit;
> +
> + sbio = sctx->bios[sctx->curr];
> + device = sbio->dev;
> + bwlimit = READ_ONCE(device->scrub_speed_max);
> + if (bwlimit == 0)
> + return;
> +
> + /*
> + * Slice is divided into intervals when the IO is submitted, adjust by
> + * bwlimit and maximum of 64 intervals.
> + */
> + div = max_t(u32, 1, (u32)(bwlimit / (16 * 1024 * 1024)));
> + div = min_t(u32, 64, div);
> +
> + /* Start new epoch, set deadline */
> + now = ktime_get();
> + if (sctx->throttle_deadline == 0) {
> + sctx->throttle_deadline = ktime_add_ms(now, time_slice / div);
ERROR: modpost: "__udivdi3" [fs/btrfs/btrfs.ko] undefined!
div_u64(bwlimit, div)
> + sctx->throttle_sent = 0;
> + }
> +
> + /* Still in the time to send? */
> + if (ktime_before(now, sctx->throttle_deadline)) {
> + /* If current bio is within the limit, send it */
> + sctx->throttle_sent += sbio->bio->bi_iter.bi_size;
> + if (sctx->throttle_sent <= bwlimit / div)
> + return;
> +
> + /* We're over the limit, sleep until the rest of the slice */
> + delta = ktime_ms_delta(sctx->throttle_deadline, now);
> + } else {
> + /* New request after deadline, start new epoch */
> + delta = 0;
> + }
> +
> + if (delta)
> + schedule_timeout_interruptible(delta * HZ / 1000);
ERROR: modpost: "__divdi3" [fs/btrfs/btrfs.ko] undefined!
I'm a bit surprised gcc doesn't emit code for the division by the
constant 1000, but emits a call to __divdi3(). So this has to become
div_u64(), too.
> + /* Next call will start the deadline period */
> + sctx->throttle_deadline = 0;
> +}
BTW, any chance you can start adding lore Link: tags to your commits, to
make it easier to find the email thread to reply to when reporting a
regression?
Thanks!
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@...ux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
Powered by blists - more mailing lists