linux-kernel - Re: [RFC PATCH] scsi: fix oops in scsi_uninit

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1552521077.45180.119.camel@acm.org>
Date:   Wed, 13 Mar 2019 16:51:17 -0700
From:   Bart Van Assche <bvanassche@....org>
To:     Jason Yan <yanaijie@...wei.com>,
        Christoph Hellwig <hch@...radead.org>
Cc:     martin.petersen@...cle.com, jejb@...ux.vnet.ibm.com,
        Jens Axboe <axboe@...nel.dk>, linux-scsi@...r.kernel.org,
        linux-kernel@...r.kernel.org, hare@...e.com,
        dan.j.williams@...el.com, jthumshirn@...e.de,
        Steffen Maier <maier@...ux.ibm.com>
Subject: Re: [RFC PATCH] scsi: fix oops in scsi_uninit_cmd()

On Thu, 2019-02-21 at 16:53 +0800, Jason Yan wrote:
> On 2019/2/20 23:18, Christoph Hellwig wrote:
> > [fullquote removed, please follow proper mail etiquette]
> > 
> > On Tue, Feb 19, 2019 at 08:56:28AM -0800, Bart Van Assche wrote:
> > > regression in the SCSI sd driver due to the switch from the legacy block
> > > layer to scsi-mq. The above patch introduces two atomic operations in the
> > > hot path and hence would introduce a performance regression. I think this
> > > can be avoided by making sure that sd_uninit_command() gets called before
> > > the request tag is freed. What changes would be required to make the block
> > > layer core call sd_uninit_command() before the request tag is freed? Would
> > > introducing prep_rq_fn and unprep_rq_fn callbacks in struct blk_mq_ops and
> > > making sure that the SCSI core sets these callback function pointers
> > > appropriately be sufficient? Would such a change allow to simplify the NVMe
> > > initiator driver? Are there any alternatives to this approach that are more
> > > elegant?
> > 
> > Additional indirect calls in the I/O fast path is something I'd rather
> > avoid.  But I don't fully understand the problem yet - where do
> > we release a disk reference from blk_update_request?  
> 
> When userspace close the fd after blk_update_request() and before
> scsi_mq_uninit_cmd(), a disk reference will be released. It is not the
> blk_update_request() directly released it.
> 
> close
>     ->sd_release
>        ->scsi_disk_put
>          ->scsi_disk_release
>            ->disk->private_data = NULL;
> 
> The userspace can close the fd because blk_update_request() returned the
> last IO , the userspace application does not have to stuck on read() or
> write(). The window is very small, but it can be reproduce every day
> in our testcases. So I'm very curious why. One possible explanation is
> that we enabled kernel preempt(CONFIG_PREEMPT).
> 
> And why can't we move that release to __blk_mq_end_request?

Hi Jason,

What is the current status of this issue?

Thanks,

Bart.