lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b7b26a7a-70be-d805-ee64-67fba0b4efa8@mellanox.com>
Date:   Sun, 9 Dec 2018 14:22:29 +0000
From:   Nitzan Carmi <nitzanc@...lanox.com>
To:     Sagi Grimberg <sagi@...mberg.me>,
        Jaesoo Lee <jalee@...estorage.com>,
        "keith.busch@...el.com" <keith.busch@...el.com>,
        "axboe@...com" <axboe@...com>, "hch@....de" <hch@....de>
CC:     "roland@...estorage.com" <roland@...estorage.com>,
        "psajeepa@...estorage.com" <psajeepa@...estorage.com>,
        "ashishk@...estorage.com" <ashishk@...estorage.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-nvme@...ts.infradead.org" <linux-nvme@...ts.infradead.org>
Subject: Re: [PATCH] nvme-rdma: complete requests from ->timeout

Hi,
We encountered similar issue.
I think that the problem is that error_recovery might not even be 
queued, in case we're in DELETING state (or CONNECTING state, for that 
matter), because we cannot move from those states to RESETTING.

We prepared some patches which handle completions in case such scenario 
happens (which, in fact, might happen in numerous error flows).

Does it solve your problem?
Nitzan.


On 30/11/2018 03:30, Sagi Grimberg wrote:
> 
>> This does not hold at least for NVMe RDMA host driver. An example 
>> scenario
>> is when the RDMA connection is gone while the controller is being 
>> deleted.
>> In this case, the nvmf_reg_write32() for sending shutdown admin 
>> command by
>> the delete_work could be hung forever if the command is not completed by
>> the timeout handler.
> 
> If the queue is gone, this means that the queue has already flushed and
> any commands that were inflight has completed with a flush error
> completion...
> 
> Can you describe the scenario that caused this hang? When has the
> queue became "gone" and when did the shutdown command execute?
> 
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme@...ts.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme

View attachment "0001-nvme-Introduce-nvme_is_aen_req-function.patch" of type "text/plain" (2333 bytes)

View attachment "0002-nvme-rdma-Handle-completions-if-error_recovery-fails.patch" of type "text/plain" (7166 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ