linux-kernel - Re: [PATCH 2/4] block: fix ordering between checking BLK_MQ_S_STOPPED and adding requests to hctx->dispatch

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAMZfGtWxE9z4GgmpEBXzwsy_HAyOOZ85+2HUyqE-9+n1f2aPJA@mail.gmail.com>
Date: Mon, 19 Aug 2024 11:49:32 +0800
From: Muchun Song <songmuchun@...edance.com>
To: Ming Lei <ming.lei@...hat.com>
Cc: Jens Axboe <axboe@...nel.dk>, "open list:BLOCK LAYER" <linux-block@...r.kernel.org>, 
	LKML <linux-kernel@...r.kernel.org>, muchun.song@...ux.dev
Subject: Re: [PATCH 2/4] block: fix ordering between checking BLK_MQ_S_STOPPED
 and adding requests to hctx->dispatch

On Mon, Aug 19, 2024 at 10:28 AM Ming Lei <ming.lei@...hat.com> wrote:
>
> Hi Muchun,
>
> On Sun, Aug 11, 2024 at 06:19:19PM +0800, Muchun Song wrote:
> > Supposing the following scenario with a virtio_blk driver.
> >
> > CPU0                                                                CPU1
> >
> > blk_mq_try_issue_directly()
> >     __blk_mq_issue_directly()
> >         q->mq_ops->queue_rq()
> >             virtio_queue_rq()
> >                 blk_mq_stop_hw_queue()
> >                                                                     virtblk_done()
> >     blk_mq_request_bypass_insert()                                      blk_mq_start_stopped_hw_queues()
> >         /* Add IO request to dispatch list */   1) store                    blk_mq_start_stopped_hw_queue()
> >                                                                                 clear_bit(BLK_MQ_S_STOPPED)                 3) store
> >     blk_mq_run_hw_queue()                                                       blk_mq_run_hw_queue()
> >         if (!blk_mq_hctx_has_pending())                                             if (!blk_mq_hctx_has_pending())         4) load
> >             return                                                                      return
> >         blk_mq_sched_dispatch_requests()                                            blk_mq_sched_dispatch_requests()
> >             if (blk_mq_hctx_stopped())          2) load                                 if (blk_mq_hctx_stopped())
> >                 return                                                                      return
> >             __blk_mq_sched_dispatch_requests()                                          __blk_mq_sched_dispatch_requests()
> >
> > The full memory barrier should be inserted between 1) and 2), as well as between
> > 3) and 4) to make sure that either CPU0 sees BLK_MQ_S_STOPPED is cleared or CPU1
> > sees dispatch list or setting of bitmap of software queue. Otherwise, either CPU
> > will not re-run the hardware queue causing starvation.
>
> Yeah, it is one kind of race which is triggered when adding request into
> ->dispatch list after returning STS_RESOURCE. We were troubled by lots of
> such kind of race.

Yes. I saw the similar fix for BLK_MQ_S_SCHED_RESTART.

>
> stopping queue is used in very less drivers, and its only purpose should
> be for throttling hw queue in case that low level queue is busy. There seems
> more uses of blk_mq_stop_hw_queues(), but most of them should be replaced
> with blk_mq_quiesce_queue().
>
> IMO, fixing this kind of issue via memory barrier is too tricky to
> maintain cause WRITE/READ dependency is very hard to follow. I'd suggest to
> make memory barrier solution as the last resort, and we can try to figure
> out other easier & more reliable way first.

I do agree it is hard to maintain the dependencies in the future. We should
propose an easy-maintainable solution. But I thought it is a long-term issue
throughout different stable linux distros. Adding a mb is the easy way to fix
the problem (the code footprint is really small), so it will be very
easy for others
to backport those bug fixes to different stable linux distros. Therefore, mb
should be an interim solution. Then, we could improve it based on the solution
you've proposed below. What do you think?

Thanks,
Muchun.

>
> One idea I thought of is to call blk_mq_request_bypass_insert()(or rename
> & export it) before calling blk_mq_stop_hw_queue() in driver, then
> return new status code STS_STOP_DISPATCH for notifying blk-mq to stop
> dispatch simply.
>
>
> thanks,
> Ming
>