linux-kernel - Re: [PATCH v2 1/2] lib/sbitmap: convert shallow

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <jr54uomodnzqyw4bu4hcdpllgafkhueyygiiempuudwjy3vir5@d7lv3jsxxqx2>
Date: Wed, 30 Jul 2025 15:03:56 +0200
From: Jan Kara <jack@...e.cz>
To: Yu Kuai <yukuai1@...weicloud.com>
Cc: Jan Kara <jack@...e.cz>, axboe@...nel.dk, akpm@...ux-foundation.org, 
	yang.yang@...o.com, dlemoal@...nel.org, ming.lei@...hat.com, 
	linux-block@...r.kernel.org, linux-kernel@...r.kernel.org, yi.zhang@...wei.com, 
	yangerkun@...wei.com, johnny.chenyi@...wei.com, Omar Sandoval <osandov@...com>, 
	"yukuai (C)" <yukuai3@...wei.com>
Subject: Re: [PATCH v2 1/2] lib/sbitmap: convert shallow_depth from one word
 to the whole sbitmap

On Wed 30-07-25 10:03:50, Yu Kuai wrote:
> 在 2025/07/29 18:16, Jan Kara 写道:
> > On Tue 29-07-25 11:19:05, Yu Kuai wrote:
> > > From: Yu Kuai <yukuai3@...wei.com>
> > > 
> > > Currently elevators will record internal 'async_depth' to throttle
> > > asynchronous requests, and they both calculate shallow_dpeth based on
> > > sb->shift, with the respect that sb->shift is the available tags in one
> > > word.
> > > 
> > > However, sb->shift is not the availbale tags in the last word, see
> > > __map_depth:
> > > 
> > > if (index == sb->map_nr - 1)
> > >    return sb->depth - (index << sb->shift);
> > > 
> > > For consequence, if the last word is used, more tags can be get than
> > > expected, for example, assume nr_requests=256 and there are four words,
> > > in the worst case if user set nr_requests=32, then the first word is
> > > the last word, and still use bits per word, which is 64, to calculate
> > > async_depth is wrong.
> > > 
> > > One the other hand, due to cgroup qos, bfq can allow only one request
> > > to be allocated, and set shallow_dpeth=1 will still allow the number
> > > of words request to be allocated.
> > > 
> > > Fix this problems by using shallow_depth to the whole sbitmap instead
> > > of per word, also change kyber, mq-deadline and bfq to follow this.
> > > 
> > > Signed-off-by: Yu Kuai <yukuai3@...wei.com>
> > 
> > I agree with these problems but AFAIU this implementation of shallow depth
> > has been done for a reason. Omar can chime in here as the original author
> > or perhaps Jens but the idea of current shallow depth implementation is
> > that each sbitmap user regardless of used shallow depth has a chance to
> > allocate from each sbitmap word which evenly distributes pressure among
> > available sbitmap words. With the implementation you've chosen there will
> > be higher pressure (and thus contention) on words with low indices.
> 
> Yes, this make sense. However, consider that shallow depth is only used
> by elevator, this higher pressure should be negligible for deadline and
> bfq. As for kyber, this might be a problem.

I agree that for bfq the overhead should be in the noise. For mq-deadline
it might be measurable but I'm not overly concerned.

> > So I think we would be good to fix issues with shallow depth for small
> > number of sbitmap words (because that's where these buggy cornercases may
> > matter in practice) but I believe the logic which constrains number of used
> > bits from each *word* when shallow_depth is specified should be kept.  It
> > might make sense to change the API so that shallow_depth is indeed
> > specified compared to the total size of the bitmap, not to the size of the
> > word (because that's confusing practically everybody I've met and is a
> > constant source of bugs) if it can be made to perform well.
> 
> Do you think will it be ok to add a new shallow depth API to use the
> total size, and convert bfq and deadline to use it?

I think having two APIs will be even more confusing than the current state.
But as I wrote I think you can have API to specify shallow depth in total
size and in sbitmap_queue_get_shallow() do:

shallow_per_word = (shallow_depth << sb->shift) / sb->depth;
rounding_index = shallow_depth - shallow_per_word * sb->depth;

and allow depth shallow_per_word + 1 if current index < rounding_index and
exactly shallow_per_word if current index >= rounding_index. This will
still evenly distribute shallow depth among words and I don't think the
additional overhead of the several arithmetic operations will be visible.

								Honza
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR