lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 15 Feb 2023 11:28:03 +0100
From:   Rick Wertenbroek <rick.wertenbroek@...il.com>
To:     Damien Le Moal <damien.lemoal@...nsource.wdc.com>
Cc:     alberto.dassatti@...g-vd.ch, xxm@...k-chips.com,
        rick.wertenbroek@...g-vd.ch, Rob Herring <robh+dt@...nel.org>,
        Krzysztof Kozlowski <krzysztof.kozlowski+dt@...aro.org>,
        Heiko Stuebner <heiko@...ech.de>,
        Shawn Lin <shawn.lin@...k-chips.com>,
        Lorenzo Pieralisi <lpieralisi@...nel.org>,
        Krzysztof WilczyƄski <kw@...ux.com>,
        Bjorn Helgaas <bhelgaas@...gle.com>,
        Jani Nikula <jani.nikula@...el.com>,
        Rodrigo Vivi <rodrigo.vivi@...el.com>,
        Mikko Kovanen <mikko.kovanen@...amobile.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        devicetree@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
        linux-rockchip@...ts.infradead.org, linux-kernel@...r.kernel.org,
        linux-pci@...r.kernel.org
Subject: Re: [PATCH v2 0/9] PCI: rockchip: Fix RK3399 PCIe endpoint controller driver

On Wed, Feb 15, 2023 at 2:51 AM Damien Le Moal
<damien.lemoal@...nsource.wdc.com> wrote:
>
> Note about that: with your series applied, nothing was working for me on
> my pine Rockpro64 board (AMD Ryzen host). I got weird/unstable behavior
> and the host IOMMU screaming about IO page faults due to the endpoint
> doing weird pci accesses. Running the host with IOMMU on really helps in
> debugging this stuff :)

Thank you for testing, I have also tested with a Ryzen host, I have IOMMU
enabled as well.

>
> With the few fixes to your series I commented about, things started to
> work better, but still very unstable. More debugging and I found out that
> the pci-epf-test drivers, both host and endpoint sides, have nasty
> problems that lead to reporting failures when things are actually working,
> or outright dummy things being done that trigger errors (e.g. bad DMA
> synchronization triggers IOMMU page faults reports). I have a dozen fix
> patches for these drivers. Will clean them up and post ASAP.
>
> With the test drivers fixed + the fixes to your series, I have the
> pci_test.sh tests passing 100% of the time, repeatedly (in a loop). All solid.
>

Good to hear that it now works, I'll try them as well.

> However, I am still seeing issues with my ongoing work with a NVMe
> endpoint driver function: I see everything working when the host BIOS
> pokes at the NVMe "drive" it sees (all good, that is normal), but once
> Linux nvme driver probe kicks in, IRQs are essentially dead: the nvme
> driver does not see anything strange and allocates IRQs (1 first, which
> ends up being INTX, then multiple MSI one for each completion queue), but
> on the endpoint side, attempting to raise MSI or INTX IRQs result in error
> as the rockchip-ep driver sees both INTX and MSI as disabled. No clue what
> is going on. I suspect that a pci reset may have happened and corrupted
> the core configuration. However, the EPC/EPF infrastructure does not
> catch/process PCI resets as far as I can tell. That may be the issue.
> I do not see this issue with the epf test driver, because I suspect the
> host BIOS not knowing anything about that device, it does not touch it.
> This all may depend on the host & BIOS. Not sure. Need to try with
> different hosts. Just FYI :)
>

Interesting that you are working on this, I started to patch the RK3399 PCIe
endpoint controller driver for a similar project, I want to run our NVMe
firmware in a Linux PCIe endpoint function.

For the IRQs there are two things that come to mind:
1) The host driver could actually disable them and work in polling mode,
I have seen that with different versions of the Linux kernel NVMe driver
sometimes it would choose to use polling instead of IRQs for the queues.
So maybe it's just the
2) The RK3399 PCIe endpoint controller is said to be able only to generate
one type of interrupt at a given time. "It is capable of generating MSI or
Legacy interrupt if the PCIe is configured as EP. Notes that one PCIe
component can't generate both types of interrupts. It is either one or the
other." (see TRM 17.5.9 Interrupt Support).
I don't know exactly what the TRM means the the controller cannot
use both interrupts at the same time, but this might be a path to explore

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ