[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <jr54uomodnzqyw4bu4hcdpllgafkhueyygiiempuudwjy3vir5@d7lv3jsxxqx2>
Date: Wed, 30 Jul 2025 15:03:56 +0200
From: Jan Kara <jack@...e.cz>
To: Yu Kuai <yukuai1@...weicloud.com>
Cc: Jan Kara <jack@...e.cz>, axboe@...nel.dk, akpm@...ux-foundation.org,
yang.yang@...o.com, dlemoal@...nel.org, ming.lei@...hat.com,
linux-block@...r.kernel.org, linux-kernel@...r.kernel.org, yi.zhang@...wei.com,
yangerkun@...wei.com, johnny.chenyi@...wei.com, Omar Sandoval <osandov@...com>,
"yukuai (C)" <yukuai3@...wei.com>
Subject: Re: [PATCH v2 1/2] lib/sbitmap: convert shallow_depth from one word
to the whole sbitmap
On Wed 30-07-25 10:03:50, Yu Kuai wrote:
> 在 2025/07/29 18:16, Jan Kara 写道:
> > On Tue 29-07-25 11:19:05, Yu Kuai wrote:
> > > From: Yu Kuai <yukuai3@...wei.com>
> > >
> > > Currently elevators will record internal 'async_depth' to throttle
> > > asynchronous requests, and they both calculate shallow_dpeth based on
> > > sb->shift, with the respect that sb->shift is the available tags in one
> > > word.
> > >
> > > However, sb->shift is not the availbale tags in the last word, see
> > > __map_depth:
> > >
> > > if (index == sb->map_nr - 1)
> > > return sb->depth - (index << sb->shift);
> > >
> > > For consequence, if the last word is used, more tags can be get than
> > > expected, for example, assume nr_requests=256 and there are four words,
> > > in the worst case if user set nr_requests=32, then the first word is
> > > the last word, and still use bits per word, which is 64, to calculate
> > > async_depth is wrong.
> > >
> > > One the other hand, due to cgroup qos, bfq can allow only one request
> > > to be allocated, and set shallow_dpeth=1 will still allow the number
> > > of words request to be allocated.
> > >
> > > Fix this problems by using shallow_depth to the whole sbitmap instead
> > > of per word, also change kyber, mq-deadline and bfq to follow this.
> > >
> > > Signed-off-by: Yu Kuai <yukuai3@...wei.com>
> >
> > I agree with these problems but AFAIU this implementation of shallow depth
> > has been done for a reason. Omar can chime in here as the original author
> > or perhaps Jens but the idea of current shallow depth implementation is
> > that each sbitmap user regardless of used shallow depth has a chance to
> > allocate from each sbitmap word which evenly distributes pressure among
> > available sbitmap words. With the implementation you've chosen there will
> > be higher pressure (and thus contention) on words with low indices.
>
> Yes, this make sense. However, consider that shallow depth is only used
> by elevator, this higher pressure should be negligible for deadline and
> bfq. As for kyber, this might be a problem.
I agree that for bfq the overhead should be in the noise. For mq-deadline
it might be measurable but I'm not overly concerned.
> > So I think we would be good to fix issues with shallow depth for small
> > number of sbitmap words (because that's where these buggy cornercases may
> > matter in practice) but I believe the logic which constrains number of used
> > bits from each *word* when shallow_depth is specified should be kept. It
> > might make sense to change the API so that shallow_depth is indeed
> > specified compared to the total size of the bitmap, not to the size of the
> > word (because that's confusing practically everybody I've met and is a
> > constant source of bugs) if it can be made to perform well.
>
> Do you think will it be ok to add a new shallow depth API to use the
> total size, and convert bfq and deadline to use it?
I think having two APIs will be even more confusing than the current state.
But as I wrote I think you can have API to specify shallow depth in total
size and in sbitmap_queue_get_shallow() do:
shallow_per_word = (shallow_depth << sb->shift) / sb->depth;
rounding_index = shallow_depth - shallow_per_word * sb->depth;
and allow depth shallow_per_word + 1 if current index < rounding_index and
exactly shallow_per_word if current index >= rounding_index. This will
still evenly distribute shallow depth among words and I don't think the
additional overhead of the several arithmetic operations will be visible.
Honza
--
Jan Kara <jack@...e.com>
SUSE Labs, CR
Powered by blists - more mailing lists