[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z77uID36yIvWDQEj@fedora>
Date: Wed, 26 Feb 2025 18:34:08 +0800
From: Ming Lei <ming.lei@...hat.com>
To: Yu Kuai <yukuai1@...weicloud.com>
Cc: tj@...nel.org, josef@...icpanda.com, axboe@...nel.dk, vgoyal@...hat.com,
cgroups@...r.kernel.org, linux-block@...r.kernel.org,
linux-kernel@...r.kernel.org, yi.zhang@...wei.com,
yangerkun@...wei.com, "yukuai (C)" <yukuai3@...wei.com>
Subject: Re: [PATCH] blk-throttle: fix lower bps rate by throtl_trim_slice()
On Wed, Feb 26, 2025 at 05:56:03PM +0800, Yu Kuai wrote:
> Hi,
>
> 在 2025/02/26 16:34, Ming Lei 写道:
> > On Wed, Feb 26, 2025 at 09:16:27AM +0800, Yu Kuai wrote:
> > > From: Yu Kuai <yukuai3@...wei.com>
> > >
> > > The bio submission time may be a few jiffies more than the expected
> > > waiting time, due to 'extra_bytes' can't be divided in
> > > tg_within_bps_limit(), and also due to timer wakeup delay. In this
> > > case, adjust slice_start to jiffies will discard the extra wait time,
> > > causing lower rate than expected.
> > >
> > > This problem will cause blktests throtl/001 failure in case of
> > > CONFIG_HZ_100=y, fix it by preserving one finished slice in
> > > throtl_trim_slice() and allowing deviation between [0, 2 slices).
> >
> > I think it only can cover single default slice deviation, since
> > throtl_trim_slice() just keeps dispatch data in the previous single
> > default slice. Or can you add words on how to allow 2 default slices
> > deviation?
> >
> > >
> > > For example, assume bps_limit is 1000bytes, 1 jiffes is 10ms, and
> > > slice is 20ms(2 jiffies), expected rate is 1000 / 1000 * 20 = 20 bytes
> > > per slice.
> > >
> > > If user issues two 21 bytes IO, then wait time will be 30ms for the
> > > first IO:
> > >
> > > bytes_allowed = 20, extra_bytes = 1;
> > > jiffy_wait = 1 + 2 = 3 jiffies
> > >
> > > and consider
> > > extra 1 jiffies by timer, throtl_trim_slice() will be called at:
> > >
> > > jiffies = 40ms
> > > slice_start = 0ms, slice_end= 40ms
> > > bytes_disp = 21
> > >
> > > In this case, before the patch, real rate in the first two slices is
> > > 10.5 bytes per slice, and slice will be updated to:
> > >
> > > jiffies = 40ms
> > > slice_start = 40ms, slice_end = 60ms,
> > > bytes_disp = 0;
> > >
> > > Hence the second IO will have to wait another 30ms;
> > >
> > > With the patch, the real rate in the first slice is 20 bytes per slice,
> > > which is the same as expected, and slice will be updated:
> > >
> > > jiffies=40ms,
> > > slice_start = 20ms, slice_end = 60ms,
> > > bytes_disp = 1;
> > >
> > > And now, there is still 19 bytes allowed in the second slice, and the
> > > second IO will only have to wait 10ms;
> > >
> > > Fixes: e43473b7f223 ("blkio: Core implementation of throttle policy")
> > > Reported-by: Ming Lei <ming.lei@...hat.com>
> > > Closes: https://lore.kernel.org/linux-block/20250222092823.210318-3-yukuai1@huaweicloud.com/
> > > Signed-off-by: Yu Kuai <yukuai3@...wei.com>
> > > ---
> > > block/blk-throttle.c | 13 +++++++++++--
> > > 1 file changed, 11 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/block/blk-throttle.c b/block/blk-throttle.c
> > > index 8d149aff9fd0..cb472cf7b6b6 100644
> > > --- a/block/blk-throttle.c
> > > +++ b/block/blk-throttle.c
> > > @@ -599,14 +599,23 @@ static inline void throtl_trim_slice(struct throtl_grp *tg, bool rw)
> > > * sooner, then we need to reduce slice_end. A high bogus slice_end
> > > * is bad because it does not allow new slice to start.
> > > */
> > > -
> > > throtl_set_slice_end(tg, rw, jiffies + tg->td->throtl_slice);
> > > time_elapsed = rounddown(jiffies - tg->slice_start[rw],
> > > tg->td->throtl_slice);
> > > - if (!time_elapsed)
> > > + /* Don't trim slice until at least 2 slices are used */
> > > + if (time_elapsed < tg->td->throtl_slice * 2)
> > > return;
> > > + /*
> > > + * The bio submission time may be a few jiffies more than the expected
> > > + * waiting time, due to 'extra_bytes' can't be divided in
> > > + * tg_within_bps_limit(), and also due to timer wakeup delay. In this
> > > + * case, adjust slice_start to jiffies will discard the extra wait time,
> > > + * causing lower rate than expected. Therefore, one slice is preserved,
> > > + * allowing deviation that is less than two slices.
> > > + */
> > > + time_elapsed -= tg->td->throtl_slice;
> >
> > Please document that default slice window size is doubled actually in
> > this way.
>
> I said two slices because there is a round down:
>
> >> time_elapsed = rounddown(jiffies - tg->slice_start[rw],
> >> tg->td->throtl_slice);
>
> Hence the deviation is actually between [1 ,2) jiffies, depends on the
> time start to wait and how long the delay is.
>
> If start to wait at slice_start + n * throtl_slice - 1, the deviation is
> *at most 1 slice*
>
> If start to wait at slice_stat + n * throtl_slice, the max deviation is
> *less than 2 slices* (2 slices not included)
>
> Now, I agree allowing deviation at most 1 slice is more appropriate. :)
Actually the in-tree code already covers deviation from the rounddown(),
but turns out it is still not enough, so this patch increases one extra
def slice size. With this thing is documented, feel free to add:
Reviewed-by: Ming Lei <ming.lei@...hat.com>
Thanks,
Ming
Powered by blists - more mailing lists