[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <021e6495-11e5-3b39-2786-d69cc4bf24f7@huaweicloud.com>
Date: Wed, 26 Feb 2025 17:56:03 +0800
From: Yu Kuai <yukuai1@...weicloud.com>
To: Ming Lei <ming.lei@...hat.com>, Yu Kuai <yukuai1@...weicloud.com>
Cc: tj@...nel.org, josef@...icpanda.com, axboe@...nel.dk, vgoyal@...hat.com,
cgroups@...r.kernel.org, linux-block@...r.kernel.org,
linux-kernel@...r.kernel.org, yi.zhang@...wei.com, yangerkun@...wei.com,
"yukuai (C)" <yukuai3@...wei.com>
Subject: Re: [PATCH] blk-throttle: fix lower bps rate by throtl_trim_slice()
Hi,
在 2025/02/26 16:34, Ming Lei 写道:
> On Wed, Feb 26, 2025 at 09:16:27AM +0800, Yu Kuai wrote:
>> From: Yu Kuai <yukuai3@...wei.com>
>>
>> The bio submission time may be a few jiffies more than the expected
>> waiting time, due to 'extra_bytes' can't be divided in
>> tg_within_bps_limit(), and also due to timer wakeup delay. In this
>> case, adjust slice_start to jiffies will discard the extra wait time,
>> causing lower rate than expected.
>>
>> This problem will cause blktests throtl/001 failure in case of
>> CONFIG_HZ_100=y, fix it by preserving one finished slice in
>> throtl_trim_slice() and allowing deviation between [0, 2 slices).
>
> I think it only can cover single default slice deviation, since
> throtl_trim_slice() just keeps dispatch data in the previous single
> default slice. Or can you add words on how to allow 2 default slices
> deviation?
>
>>
>> For example, assume bps_limit is 1000bytes, 1 jiffes is 10ms, and
>> slice is 20ms(2 jiffies), expected rate is 1000 / 1000 * 20 = 20 bytes
>> per slice.
>>
>> If user issues two 21 bytes IO, then wait time will be 30ms for the
>> first IO:
>>
>> bytes_allowed = 20, extra_bytes = 1;
>> jiffy_wait = 1 + 2 = 3 jiffies
>>
>> and consider
>> extra 1 jiffies by timer, throtl_trim_slice() will be called at:
>>
>> jiffies = 40ms
>> slice_start = 0ms, slice_end= 40ms
>> bytes_disp = 21
>>
>> In this case, before the patch, real rate in the first two slices is
>> 10.5 bytes per slice, and slice will be updated to:
>>
>> jiffies = 40ms
>> slice_start = 40ms, slice_end = 60ms,
>> bytes_disp = 0;
>>
>> Hence the second IO will have to wait another 30ms;
>>
>> With the patch, the real rate in the first slice is 20 bytes per slice,
>> which is the same as expected, and slice will be updated:
>>
>> jiffies=40ms,
>> slice_start = 20ms, slice_end = 60ms,
>> bytes_disp = 1;
>>
>> And now, there is still 19 bytes allowed in the second slice, and the
>> second IO will only have to wait 10ms;
>>
>> Fixes: e43473b7f223 ("blkio: Core implementation of throttle policy")
>> Reported-by: Ming Lei <ming.lei@...hat.com>
>> Closes: https://lore.kernel.org/linux-block/20250222092823.210318-3-yukuai1@huaweicloud.com/
>> Signed-off-by: Yu Kuai <yukuai3@...wei.com>
>> ---
>> block/blk-throttle.c | 13 +++++++++++--
>> 1 file changed, 11 insertions(+), 2 deletions(-)
>>
>> diff --git a/block/blk-throttle.c b/block/blk-throttle.c
>> index 8d149aff9fd0..cb472cf7b6b6 100644
>> --- a/block/blk-throttle.c
>> +++ b/block/blk-throttle.c
>> @@ -599,14 +599,23 @@ static inline void throtl_trim_slice(struct throtl_grp *tg, bool rw)
>> * sooner, then we need to reduce slice_end. A high bogus slice_end
>> * is bad because it does not allow new slice to start.
>> */
>> -
>> throtl_set_slice_end(tg, rw, jiffies + tg->td->throtl_slice);
>>
>> time_elapsed = rounddown(jiffies - tg->slice_start[rw],
>> tg->td->throtl_slice);
>> - if (!time_elapsed)
>> + /* Don't trim slice until at least 2 slices are used */
>> + if (time_elapsed < tg->td->throtl_slice * 2)
>> return;
>>
>> + /*
>> + * The bio submission time may be a few jiffies more than the expected
>> + * waiting time, due to 'extra_bytes' can't be divided in
>> + * tg_within_bps_limit(), and also due to timer wakeup delay. In this
>> + * case, adjust slice_start to jiffies will discard the extra wait time,
>> + * causing lower rate than expected. Therefore, one slice is preserved,
>> + * allowing deviation that is less than two slices.
>> + */
>> + time_elapsed -= tg->td->throtl_slice;
>
> Please document that default slice window size is doubled actually in
> this way.
I said two slices because there is a round down:
>> time_elapsed = rounddown(jiffies - tg->slice_start[rw],
>> tg->td->throtl_slice);
Hence the deviation is actually between [1 ,2) jiffies, depends on the
time start to wait and how long the delay is.
If start to wait at slice_start + n * throtl_slice - 1, the deviation is
*at most 1 slice*
If start to wait at slice_stat + n * throtl_slice, the max deviation is
*less than 2 slices* (2 slices not included)
Now, I agree allowing deviation at most 1 slice is more appropriate. :)
Thanks,
Kuai
>
>
> Thanks,
> Ming
>
>
> .
>
Powered by blists - more mailing lists