linux-kernel - Re: [PATCH 2/2] blk-throttle: fix off-by-one jiffies wait

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Z7w0P8ImJiZhRsPD@fedora>
Date: Mon, 24 Feb 2025 16:56:31 +0800
From: Ming Lei <ming.lei@...hat.com>
To: Yu Kuai <yukuai1@...weicloud.com>
Cc: tj@...nel.org, josef@...icpanda.com, axboe@...nel.dk,
	cgroups@...r.kernel.org, linux-block@...r.kernel.org,
	linux-kernel@...r.kernel.org, yi.zhang@...wei.com,
	yangerkun@...wei.com, "yukuai (C)" <yukuai3@...wei.com>
Subject: Re: [PATCH 2/2] blk-throttle: fix off-by-one jiffies wait_time

On Mon, Feb 24, 2025 at 03:03:18PM +0800, Yu Kuai wrote:
> Hi, Ming!
> 
> 在 2025/02/24 11:28, Ming Lei 写道:
> > throtl_trim_slice() returns immediately if throtl_slice_used()
> > is true.
> > 
> > And throtl_slice_used() checks jiffies in [start, end] via time_in_range(),
> > so if `start <= jiffies <= end', it still returns false.
> 
> Yes, I misread the code, by thinking throtl_slice_used() will return
> true if the slice is still used. :(
> 
> 
> > > BTW, throtl_trim_slice() looks like problematic:
> > > 
> > > -       if (bytes_trim <= 0 && io_trim <= 0)
> > > +       if (bytes_trim <= 0 || io_trim <= 0 ||
> > > +           tg->bytes_disp[rw] < bytes_trim || tg->io_disp[rw] < io_trim)
> > >                  return;
> > That is exactly what my patch is doing, just taking deviation and
> > timeout into account, also U64_MAX limit has to be excluded.
> Yes, perhaps you can add some comments in the last two conditions of
> your patch.

Yes, we need to add comment on the check, how about the following words?

```

If actually rate doesn't match with expected rate, do not trim slice
otherwise the present rate control info is lost, we don't have chance
to compensate it in the following period of this slice any more.

Add one deviation threshold since bio size is usually not divisible by
present rate, and bio dispatch should be done or nothing

Also limit max slice size for avoiding to extend the window too big.

```

> I think maybe you're concerned about the case IO is
> throttled by IOs and bytes_disp should really erase extra bytes that
> does not reach bps_limit.
> 
> > 
> > If you test the patch, it works just fine. My patch controls bytes_exp
> > <= 1.5 * disp, then throtl/001 can be completed in 1.5sec, and if it is
> > changed to 1.25 * disp, the test can be completed in 1.25sec.
> > 
> > With this fix, why do we have to play the complicated carryover
> > trick?
> > 
> 
> I understand your code now. carryover_bytes in this case is wrong, as
> long as new slice is not started, and trim slice doesn't erase extra
> bytes by mistake, throttle time should be accurate over time because
> bytes_disp is accurate.

Yes.

More or less wait for handling single bio can just affect instantaneous rate,
but the algorithm is adaptive so it can adjust the following wait if the
slice isn't over.

> 
> And root cause really is throtl_trim_slice().
> 
> However, by code review, I think throtl_start_new_slice() should be
> handled as well, the same as throtl_trim_slice(), if the old bio is
> dispatched with one more jiffies wait time, issue a new bio in the same
> jiffies will have the same problem. For example:

throtl_start_new_slice() is called when nothing is queued and the current
slice is used, so probably it is fine.

throtl_start_new_slice_with_credit() is called when dispatching one
bio, and it is still called when the current slice is used.

> 
> Caller do sync IO, with io size: (bps_limit / (HZ / throtl_slice) + 1),

This will cause wait for every single IO.

But once the IO is throttled because of the wait, throtl_start_new_slice()
won't succeed any more, meantime throtl_trim_slice() will try to fix the
rate control for the extra 1 jiffies.

So in-tree code with trim fix works just fine if consecutive IOs are
coming.

> and caller will issue new IO when old IO is done. Then in this case,
> each IO will start a new slice and wait for throtl_slice + 1 jiffies. I
> believe this can be fixed as well so that BIOs after the first one will
> only wait for throtl_silce jiffies.

I guess the above case only exists in theory when you submit new IO
after the old IO is dispatched immediately. Not sure this case is really
meaningful because submission period/latency is added/faked by user.

Thanks,
Ming