lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170530142346.GA39428@dhcp-216.srv.tuxera.com>
Date:   Tue, 30 May 2017 17:23:46 +0300
From:   Rakesh Pandit <rakesh@...era.com>
To:     Sagi Grimberg <sagi@...mberg.me>
CC:     <linux-nvme@...ts.infradead.org>, <linux-kernel@...r.kernel.org>,
        "Jens Axboe" <axboe@...com>, Keith Busch <keith.busch@...el.com>,
        Christoph Hellwig <hch@....de>,
        Andy Lutomirski <luto@...nel.org>
Subject: Re: [PATCH V2] nvme: fix nvme_remove going to uninterruptible sleep
 for ever

On Tue, May 30, 2017 at 01:18:55PM +0300, Sagi Grimberg wrote:
> 
> >  	/*
> > +	 * Avoid configuration and syncing commands if controller is already
> > +	 * being removed and queues have been killed.
> > +	 */
> > +	if (ctrl->state == NVME_CTRL_DELETING || ctrl->state == NVME_CTRL_DEAD)
> > +		return;
> > +
> 
> Hey Rakesh, Christoph,
> 
> Given that the issue is for sync command submission during controller
> removal, I'm wandering if we should perhaps move this check to
> __nvme_submit_sync_cmd?
> 
> AFAICT user-space can just as easily trigger set_features in the same
> condition which will trigger the hang couldn't it?


Seems possible.  But it seems worth keeping this check as it avoids
the instructions between start of nvme_configure_apst and
__nvme_submit_sync_cmd.  This check seems to solve more severe hang as
PID which started off from nvme_remove eventually hangs itself on
blk_execute_rq..

We can fix user-space triggered set_features higger up e.g. in
nvme_ioctl by putting same check.  Introduction of a separate state
NVME_CTRL_SCHED_RESET (being discussed in another thread) has
additional advantage of making sure that only one thread is going
through resetting and eventually through removal (if required) and
solves lot of problems.

It makes sense to push this separately because of above reasons and we
can fix user space trigger of deadlock once discussion on another
thread has moved forward on introducing of new state.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ