lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAK9iUCPG-H3vWjiEnjAgr_nWEf=0Sn+AvRt68_Sg16Zw8TobRg@mail.gmail.com>
Date:   Mon, 23 Jul 2018 19:20:41 -0700
From:   Sinan Kaya <Okaya@...nel.org>
To:     Alex Williamson <alex.williamson@...hat.com>
Cc:     linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-nvme@...ts.infradead.org
Subject: Re: [PATCH v2 2/2] PCI: NVMe device specific reset quirk

On 7/23/18, Alex Williamson <alex.williamson@...hat.com> wrote:
> On Mon, 23 Jul 2018 17:40:02 -0700
> Sinan Kaya <okaya@...nel.org> wrote:
>
>> On 7/23/2018 5:13 PM, Alex Williamson wrote:
>> > + * The NVMe specification requires that controllers support PCIe FLR,
>> > but
>> > + * but some Samsung SM961/PM961 controllers fail to recover after FLR
>> > (-1
>> > + * config space) unless the device is quiesced prior to FLR.
>>
>> Does disabling the memory bit in PCI config space as part of the FLR
>> reset function help? (like the very first thing)
>
> No, it does not.  I modified this to only clear PCI_COMMAND_MEMORY and
> call pcie_flr(), the Samsung controller dies just as it did previously.
>
>> Can we do that in the pcie_flr() function to cover other endpoint types
>> that might be pushing traffic while code is trying to do a reset?
>
> Do you mean PCI_COMMAND_MASTER rather than PCI_COMMAND_MEMORY?

Yes

>  I tried
> that too, it doesn't work either.  I'm not really sure the theory
> behind clearing memory, clearing busmaster to stop DMA seems like a
> sane thing to do, but doesn't help here.

Let me explain what I guessed. You might be able to fill in the blanks
where I am completely off.

We do vfio initiated flr reset immediately following guest machine
shutdown. The card could be fully enabled and pushing traffic to the
system at this moment.

I don't know if vfio does any device disable or not.

FLR is supposed to reset the endpoint but endpoint doesn't recover per
your report.

Having vendor specific reset routines for PCIE endpoints defeats the
purpose of FLR.

Since the adapter is fully functional, i suggested turning off bus
master and memory enable bits to stop endpoint from sending packets.

But, this is not helping either.

Those sleep statements looked very fragile to be honest.

I was curious if there is something else that we could do for other endpoints.

No objections otherwise.

>
> Alex
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ