[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aKADe9hNz99dQTfy@fedora>
Date: Sat, 16 Aug 2025 12:05:15 +0800
From: Ming Lei <ming.lei@...hat.com>
To: Yu Kuai <yukuai@...nel.org>
Cc: Nilay Shroff <nilay@...ux.ibm.com>, Yu Kuai <yukuai1@...weicloud.com>,
axboe@...nel.dk, hare@...e.de, linux-block@...r.kernel.org,
linux-kernel@...r.kernel.org, yukuai3@...wei.com,
yi.zhang@...wei.com, yangerkun@...wei.com, johnny.chenyi@...wei.com
Subject: Re: [PATCH 08/10] blk-mq: fix blk_mq_tags double free while
nr_requests grown
On Sat, Aug 16, 2025 at 10:57:23AM +0800, Yu Kuai wrote:
> Hi,
>
> 在 2025/8/16 3:30, Nilay Shroff 写道:
> >
> > On 8/15/25 1:32 PM, Yu Kuai wrote:
> > > From: Yu Kuai <yukuai3@...wei.com>
> > >
> > > In the case user trigger tags grow by queue sysfs attribute nr_requests,
> > > hctx->sched_tags will be freed directly and replaced with a new
> > > allocated tags, see blk_mq_tag_update_depth().
> > >
> > > The problem is that hctx->sched_tags is from elevator->et->tags, while
> > > et->tags is still the freed tags, hence later elevator exist will try to
> > > free the tags again, causing kernel panic.
> > >
> > > Fix this problem by using new allocated elevator_tags, also convert
> > > blk_mq_update_nr_requests to void since this helper will never fail now.
> > >
> > > Meanwhile, there is a longterm problem can be fixed as well:
> > >
> > > If blk_mq_tag_update_depth() succeed for previous hctx, then bitmap depth
> > > is updated, however, if following hctx failed, q->nr_requests is not
> > > updated and the previous hctx->sched_tags endup bigger than q->nr_requests.
> > >
> > > Fixes: f5a6604f7a44 ("block: fix lockdep warning caused by lock dependency in elv_iosched_store")
> > > Fixes: e3a2b3f931f5 ("blk-mq: allow changing of queue depth through sysfs")
> > > Signed-off-by: Yu Kuai <yukuai3@...wei.com>
> > > ---
> > > block/blk-mq.c | 19 ++++++-------------
> > > block/blk-mq.h | 4 +++-
> > > block/blk-sysfs.c | 21 ++++++++++++++-------
> > > 3 files changed, 23 insertions(+), 21 deletions(-)
> > >
> > > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > > index 11c8baebb9a0..e9f037a25fe3 100644
> > > --- a/block/blk-mq.c
> > > +++ b/block/blk-mq.c
> > > @@ -4917,12 +4917,12 @@ void blk_mq_free_tag_set(struct blk_mq_tag_set *set)
> > > }
> > > EXPORT_SYMBOL(blk_mq_free_tag_set);
> > > -int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
> > > +void blk_mq_update_nr_requests(struct request_queue *q,
> > > + struct elevator_tags *et, unsigned int nr)
> > > {
> > > struct blk_mq_tag_set *set = q->tag_set;
> > > struct blk_mq_hw_ctx *hctx;
> > > unsigned long i;
> > > - int ret = 0;
> > > blk_mq_quiesce_queue(q);
> > > @@ -4946,24 +4946,17 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
> > > nr - hctx->sched_tags->nr_reserved_tags);
> > > }
> > > } else {
> > > - queue_for_each_hw_ctx(q, hctx, i) {
> > > - if (!hctx->tags)
> > > - continue;
> > > - ret = blk_mq_tag_update_depth(hctx, &hctx->sched_tags,
> > > - nr);
> > > - if (ret)
> > > - goto out;
> > > - }
> > > + blk_mq_free_sched_tags(q->elevator->et, set);
> > I think you also need to ensure that elevator tags are freed after we unfreeze
> > queue and release ->elevator_lock otherwise we may get into the lockdep splat
> > for pcpu_lock dependency on ->freeze_lock and/or ->elevator_lock. Please note
> > that blk_mq_free_sched_tags internally invokes sbitmap_free which invokes
> > free_percpu which acquires pcpu_lock.
>
> Ok, thanks for the notice. However, as Ming suggested, we might fix this
> problem
>
> in the next merge window.
There are two issues involved:
- blk_mq_tags double free, introduced recently
- long-term lock issue in queue_requests_store()
IMO, the former is a bit serious, because kernel panic can be triggered,
so suggest to make it to v6.17. The latter looks less serious and has
existed for long time, but may need code refactor to get clean fix.
> I'll send one patch to fix this regression by
> replace
>
> st->tags with reallocated new sched_tags as well.
Patch 7 in this patchset and patch 8 in your 1st post looks enough to
fix this double free issue.
Thanks,
Ming
Powered by blists - more mailing lists