linux-kernel - Re: [RFC PATCH] scsi: fix oops in scsi_uninit

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CACVXFVMWDD+8rrVh9TYnA6awLPQZNscC6OrYmpAizj9PxxuZJw@mail.gmail.com>
Date:   Fri, 15 Mar 2019 15:56:56 +0800
From:   Ming Lei <tom.leiming@...il.com>
To:     Jason Yan <yanaijie@...wei.com>
Cc:     Christoph Hellwig <hch@...radead.org>,
        Bart Van Assche <bvanassche@....org>,
        "Martin K. Petersen" <martin.petersen@...cle.com>,
        James Bottomley <jejb@...ux.vnet.ibm.com>,
        Jens Axboe <axboe@...nel.dk>,
        Linux SCSI List <linux-scsi@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Hannes Reinecke <hare@...e.com>,
        Dan Williams <dan.j.williams@...el.com>,
        Johannes Thumshirn <jthumshirn@...e.de>,
        Steffen Maier <maier@...ux.ibm.com>
Subject: Re: [RFC PATCH] scsi: fix oops in scsi_uninit_cmd()

On Thu, Feb 21, 2019 at 4:55 PM Jason Yan <yanaijie@...wei.com> wrote:
>
> Hi, Christoph
>
> On 2019/2/20 23:18, Christoph Hellwig wrote:
> > [fullquote removed, please follow proper mail etiquette]
> >
> > On Tue, Feb 19, 2019 at 08:56:28AM -0800, Bart Van Assche wrote:
> >> regression in the SCSI sd driver due to the switch from the legacy block
> >> layer to scsi-mq. The above patch introduces two atomic operations in the
> >> hot path and hence would introduce a performance regression. I think this
> >> can be avoided by making sure that sd_uninit_command() gets called before
> >> the request tag is freed. What changes would be required to make the block
> >> layer core call sd_uninit_command() before the request tag is freed? Would
> >> introducing prep_rq_fn and unprep_rq_fn callbacks in struct blk_mq_ops and
> >> making sure that the SCSI core sets these callback function pointers
> >> appropriately be sufficient? Would such a change allow to simplify the NVMe
> >> initiator driver? Are there any alternatives to this approach that are more
> >> elegant?
> >
> > Additional indirect calls in the I/O fast path is something I'd rather
> > avoid.  But I don't fully understand the problem yet - where do
> > we release a disk reference from blk_update_request?
>
> When userspace close the fd after blk_update_request() and before
> scsi_mq_uninit_cmd(), a disk reference will be released. It is not the
> blk_update_request() directly released it.
>
> close
>     ->sd_release
>        ->scsi_disk_put
>          ->scsi_disk_release
>            ->disk->private_data = NULL;
>
> The userspace can close the fd because blk_update_request() returned the
> last IO , the userspace application does not have to stuck on read() or
> write(). The window is very small, but it can be reproduce every day
> in our testcases. So I'm very curious why. One possible explanation is
> that we enabled kernel preempt(CONFIG_PREEMPT).

Another solution is to drain in-flight FS IO in scsi_disk_release(),
and one counter
is needed for tracking in-flight passthrough IO, so we can use sdev->device_busy
- sdev->passthrough_ios to drain inflight FS IO.

>
> And why can't
> > we move that release to __blk_mq_end_request?

I think it is doable, then ending bio needs to be moved out of
blk_update_request(),
such as, add one list of rq->done_bio to track completed bio, then complete all
in free request.

And for partial completion, the completed bio can be done in
blk_update_request(),
since the remained bios will cause fs to hold the disk.


Thanks,
Ming Lei