lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Fri, 15 Mar 2019 15:56:56 +0800 From: Ming Lei <tom.leiming@...il.com> To: Jason Yan <yanaijie@...wei.com> Cc: Christoph Hellwig <hch@...radead.org>, Bart Van Assche <bvanassche@....org>, "Martin K. Petersen" <martin.petersen@...cle.com>, James Bottomley <jejb@...ux.vnet.ibm.com>, Jens Axboe <axboe@...nel.dk>, Linux SCSI List <linux-scsi@...r.kernel.org>, Linux Kernel Mailing List <linux-kernel@...r.kernel.org>, Hannes Reinecke <hare@...e.com>, Dan Williams <dan.j.williams@...el.com>, Johannes Thumshirn <jthumshirn@...e.de>, Steffen Maier <maier@...ux.ibm.com> Subject: Re: [RFC PATCH] scsi: fix oops in scsi_uninit_cmd() On Thu, Feb 21, 2019 at 4:55 PM Jason Yan <yanaijie@...wei.com> wrote: > > Hi, Christoph > > On 2019/2/20 23:18, Christoph Hellwig wrote: > > [fullquote removed, please follow proper mail etiquette] > > > > On Tue, Feb 19, 2019 at 08:56:28AM -0800, Bart Van Assche wrote: > >> regression in the SCSI sd driver due to the switch from the legacy block > >> layer to scsi-mq. The above patch introduces two atomic operations in the > >> hot path and hence would introduce a performance regression. I think this > >> can be avoided by making sure that sd_uninit_command() gets called before > >> the request tag is freed. What changes would be required to make the block > >> layer core call sd_uninit_command() before the request tag is freed? Would > >> introducing prep_rq_fn and unprep_rq_fn callbacks in struct blk_mq_ops and > >> making sure that the SCSI core sets these callback function pointers > >> appropriately be sufficient? Would such a change allow to simplify the NVMe > >> initiator driver? Are there any alternatives to this approach that are more > >> elegant? > > > > Additional indirect calls in the I/O fast path is something I'd rather > > avoid. But I don't fully understand the problem yet - where do > > we release a disk reference from blk_update_request? > > When userspace close the fd after blk_update_request() and before > scsi_mq_uninit_cmd(), a disk reference will be released. It is not the > blk_update_request() directly released it. > > close > ->sd_release > ->scsi_disk_put > ->scsi_disk_release > ->disk->private_data = NULL; > > The userspace can close the fd because blk_update_request() returned the > last IO , the userspace application does not have to stuck on read() or > write(). The window is very small, but it can be reproduce every day > in our testcases. So I'm very curious why. One possible explanation is > that we enabled kernel preempt(CONFIG_PREEMPT). Another solution is to drain in-flight FS IO in scsi_disk_release(), and one counter is needed for tracking in-flight passthrough IO, so we can use sdev->device_busy - sdev->passthrough_ios to drain inflight FS IO. > > And why can't > > we move that release to __blk_mq_end_request? I think it is doable, then ending bio needs to be moved out of blk_update_request(), such as, add one list of rq->done_bio to track completed bio, then complete all in free request. And for partial completion, the completed bio can be done in blk_update_request(), since the remained bios will cause fs to hold the disk. Thanks, Ming Lei
Powered by blists - more mailing lists