[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <10ea95ec-e259-3511-44c4-58e4d255eb9f@huawei.com>
Date: Thu, 21 Feb 2019 16:53:54 +0800
From: Jason Yan <yanaijie@...wei.com>
To: Christoph Hellwig <hch@...radead.org>,
Bart Van Assche <bvanassche@....org>
CC: <martin.petersen@...cle.com>, <jejb@...ux.vnet.ibm.com>,
Jens Axboe <axboe@...nel.dk>, <linux-scsi@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, <hare@...e.com>,
<dan.j.williams@...el.com>, <jthumshirn@...e.de>,
Steffen Maier <maier@...ux.ibm.com>
Subject: Re: [RFC PATCH] scsi: fix oops in scsi_uninit_cmd()
Hi, Christoph
On 2019/2/20 23:18, Christoph Hellwig wrote:
> [fullquote removed, please follow proper mail etiquette]
>
> On Tue, Feb 19, 2019 at 08:56:28AM -0800, Bart Van Assche wrote:
>> regression in the SCSI sd driver due to the switch from the legacy block
>> layer to scsi-mq. The above patch introduces two atomic operations in the
>> hot path and hence would introduce a performance regression. I think this
>> can be avoided by making sure that sd_uninit_command() gets called before
>> the request tag is freed. What changes would be required to make the block
>> layer core call sd_uninit_command() before the request tag is freed? Would
>> introducing prep_rq_fn and unprep_rq_fn callbacks in struct blk_mq_ops and
>> making sure that the SCSI core sets these callback function pointers
>> appropriately be sufficient? Would such a change allow to simplify the NVMe
>> initiator driver? Are there any alternatives to this approach that are more
>> elegant?
>
> Additional indirect calls in the I/O fast path is something I'd rather
> avoid. But I don't fully understand the problem yet - where do
> we release a disk reference from blk_update_request?
When userspace close the fd after blk_update_request() and before
scsi_mq_uninit_cmd(), a disk reference will be released. It is not the
blk_update_request() directly released it.
close
->sd_release
->scsi_disk_put
->scsi_disk_release
->disk->private_data = NULL;
The userspace can close the fd because blk_update_request() returned the
last IO , the userspace application does not have to stuck on read() or
write(). The window is very small, but it can be reproduce every day
in our testcases. So I'm very curious why. One possible explanation is
that we enabled kernel preempt(CONFIG_PREEMPT).
And why can't
> we move that release to __blk_mq_end_request?
>
>>
>> Thanks,
>>
>> Bart.
> ---end quoted text---
>
> .
>
Powered by blists - more mailing lists