lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <470360a7-0242-9ae5-816f-13608f957bf6@nvidia.com>
Date:   Thu, 29 Apr 2021 14:14:50 -0500
From:   Shanker R Donthineni <sdonthineni@...dia.com>
To:     Alex Williamson <alex.williamson@...hat.com>
CC:     Marc Zyngier <maz@...nel.org>, Will Deacon <will@...nel.org>,
        "Catalin Marinas" <catalin.marinas@....com>,
        Christoffer Dall <christoffer.dall@....com>,
        <linux-arm-kernel@...ts.infradead.org>,
        <kvmarm@...ts.cs.columbia.edu>, <linux-kernel@...r.kernel.org>,
        <kvm@...r.kernel.org>, Vikram Sethi <vsethi@...dia.com>,
        Jason Sequeira <jsequeira@...dia.com>
Subject: Re: [RFC 1/2] vfio/pci: keep the prefetchable attribute of a BAR
 region in VMA

Thanks Alex for quick reply.

On 4/29/21 1:28 PM, Alex Williamson wrote:
> If this were a valid thing to do, it should be done for all
> architectures, not just ARM64.  However, a prefetchable range only
> necessarily allows merged writes, which seems like a subset of the
> semantics implied by a WC attribute, therefore this doesn't seem
> universally valid.
>
> I'm also a bit confused by your problem statement that indicates that
> without WC you're seeing unaligned accesses, does this suggest that
> your driver is actually relying on WC semantics to perform merging to
> achieve alignment?  That seems rather like a driver bug, I'd expect UC
> vs WC is largely a difference in performance, not a means to enforce
> proper driver access patterns.  Per the PCI spec, the bridge itself can
> merge writes to prefetchable areas, presumably regardless of this
> processor attribute, perhaps that's the feature your driver is relying
> on that might be missing here.  Thanks,
The driver uses WC semantics, It's mapping PCI prefetchable BARS using ioremap_wc().  We don't see any issue for x86 architecture,  driver works fine in the host and guest kernel. The same driver works on ARM64 kernel but crashes inside VM.
GPU driver uses the architecture agnostic function ioremap_wc() like other drivers. This limitation applies to all the drivers if they use WC memory and follow ARM64 NORMAL-NC access rules.

On ARM64, ioremap_wc() is mapped to non-cacheable memory-type, no side effects on reads and unaligned accesses are allowed as per ARM-ARM architecture. The driver behavior is different in host vs guest on ARM64. 

ARM CPU generating alignment faults before transaction reaches the PCI-RC/switch/end-point-device.

We've two concerns here:
   - Performance impacts for pass-through devices.
   - The definition of ioremap_wc() function doesn't match the host kernel on ARM64

 
> Alex
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ