linux-kernel - Re: [PATCH]nvme-pci: Fixes EEH failure on ppc

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-Id: <5974b41335751a6bc59d5c823fb98202@linux.vnet.ibm.com>
Date:   Wed, 07 Feb 2018 14:19:54 -0600
From:   wenxiong <wenxiong@...ux.vnet.ibm.com>
To:     Ming Lei <ming.lei@...hat.com>
Cc:     axboe@...com, linux-kernel@...r.kernel.org,
        linux-nvme@...ts.infradead.org,
        Keith Busch <keith.busch@...el.com>,
        wenxiong@...inux.vnet.ibm.com, wenxiong@...ibm.com
Subject: Re: [PATCH]nvme-pci: Fixes EEH failure on ppc

On 2018-02-06 19:24, Ming Lei wrote:
> On Tue, Feb 06, 2018 at 02:01:05PM -0600, wenxiong wrote:
>> On 2018-02-06 10:33, Keith Busch wrote:
>> > On Mon, Feb 05, 2018 at 03:49:40PM -0600, wenxiong@...inux.vnet.ibm.com
>> > wrote:
>> > > @@ -1189,6 +1183,12 @@ static enum blk_eh_timer_return
>> > > nvme_timeout(struct request *req, bool reserved)
>> > >  	struct nvme_command cmd;
>> > >  	u32 csts = readl(dev->bar + NVME_REG_CSTS);
>> > >
>> > > +	/* If PCI error recovery process is happening, we cannot reset or
>> > > +	 * the recovery mechanism will surely fail.
>> > > +	 */
>> > > +	if (pci_channel_offline(to_pci_dev(dev->dev)))
>> > > +		return BLK_EH_HANDLED;
>> > > +
>> >
>> > This patch will tell the block layer to complete the request and
>> > consider
>> > it a success, but it doesn't look like the command actually completed at
>> > all. You're going to get data corruption this way, right? Is returning
>> > BLK_EH_HANDLED immediately really the right thing to do here?
>> 
>> Hi Ming,
>> 
>> Can you help checking if it is ok if returning BLK_EH_HANDLEDED in 
>> this
>> case?
> 
> Hi Wenxiong,
> 
> Looks Keith is correct, and this timed out request will be completed by
> block layer and NVMe driver if BLK_EH_HANDLED is returned, but this IO
> isn't completed actually, so either data loss(write) or read failure is
> caused.
> 
> Maybe BLK_EH_RESET_TIMER is fine under this situation.
> 
> Thanks,
> Ming
> 
Hi Ming,

Thanks! I have tried with BLK_EH_RESET_TIMER and EEH recovery works 
fine. I am going to resubmit the patch.

Thanks,
Wendy