lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 14 Jan 2019 19:35:48 +0000
From:   Will Deacon <will.deacon@....com>
To:     "Koenig, Christian" <Christian.Koenig@....com>
Cc:     Ard Biesheuvel <ard.biesheuvel@...aro.org>,
        Michel Dänzer <michel@...nzer.net>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Carsten Haitzler <Carsten.Haitzler@....com>,
        David Airlie <airlied@...ux.ie>,
        dri-devel <dri-devel@...ts.freedesktop.org>,
        "Huang, Ray" <Ray.Huang@....com>,
        "Zhang, Jerry" <Jerry.Zhang@....com>,
        linux-arm-kernel <linux-arm-kernel@...ts.infradead.org>,
        Bernhard Rosenkränzer 
        <Bernhard.Rosenkranzer@...aro.org>, mpe@...erman.id.au,
        benh@...nel.crashing.org
Subject: Re: [RFC PATCH] drm/ttm: force cached mappings for system RAM on ARM

[+ BenH and MPE]

On Mon, Jan 14, 2019 at 07:21:08PM +0000, Koenig, Christian wrote:
> Am 14.01.19 um 20:13 schrieb Will Deacon:
> > On Mon, Jan 14, 2019 at 07:07:54PM +0000, Koenig, Christian wrote:
> >> Am 14.01.19 um 18:32 schrieb Ard Biesheuvel:
> >>              - The reason remapping the CPU side as cacheable does work (which I
> >>              did test) is because the GPU's uncacheable accesses (which I assume
> >>              are made using the NoSnoop PCIe transaction attribute) are actually
> >>              emitted as cacheable in some cases.
> >>                 . On my AMD Seattle, with or without SMMU (which is stage 2 only), I
> >>              must use cacheable accesses from the CPU side or things are broken.
> >>              This might be a h/w flaw, though.
> >>                 . On systems with stage 1+2 SMMUs, the driver uses stage 1
> >>              translations which always override the memory attributes to cacheable
> >>              for DMA coherent devices. This is what is affecting the Cavium
> >>              ThunderX2 (although it appears the attributes emitted by the RC may be
> >>              incorrect as well.)
> >>
> >>              The latter issue is a shortcoming in the SMMU driver that we have to
> >>              fix, i.e., it should take care not to modify the incoming attributes
> >>              of DMA coherent PCIe devices for NoSnoop to be able to work.
> >>
> >>              So in summary, the mismatch appears to be between the CPU accessing
> >>              the vmap region with non-cacheable attributes and the GPU accessing
> >>              the same memory with cacheable attributes, resulting in a loss of
> >>              coherency and lots of visible corruption.
> >>
> >>          Actually it is the other way around. The CPU thinks some data is in the
> >>          cache and the GPU only updates the system memory version because the
> >>          snoop flag is not set.
> >>
> >>
> >>      That doesn't seem to be what is happening. As far as we can tell from
> >>      our experiments, all inbound transactions are always cacheable, and so
> >>      the only way to make things work is to ensure that the CPU uses the
> >>      same attributes.
> >>
> >>
> >> Ok that doesn't make any sense. If inbound transactions are cacheable or not is
> >> irrelevant when the CPU always uses uncached accesses.
> >>
> >> See on the PCIe side you have the snoop bit in the read/write transactions
> >> which tells the root hub if the device wants to snoop caches or not.
> >>
> >> When the CPU accesses some memory as cached then devices need to snoop the
> >> cache for coherent accesses.
> >>
> >> When the CPU accesses some memory as uncached then devices can disable snooping
> >> to improve performance, but when they don't do this it is mandated by the spec
> >> that this still works.
> > Which spec?
> 
> The PCIe spec. The snoop bit (or rather the NoSnoop) in the transaction 
> is perfectly optional IIRC.

Thanks for the clarification. I suspect the devil is in the details, so I'll
try to dig up the spec.

> > The Arm architecture (and others including Power afaiu) doesn't
> > guarantee coherency when memory is accessed using mismatched cacheability
> > attributes.
> 
> Well what exactly goes wrong on ARM?

Coherency (and any ordering guarantees) can be lost, so the device may see a
stale copy of the memory it is accessing. The architecture requires cache
maintenance to restore coherency between the mismatched aliases.

> As far as I know Power doesn't really supports un-cached memory at all, 
> except for a very very old and odd configuration with AGP.

Hopefully Michael/Ben can elaborate here, but I was under the (possibly
mistaken) impression that mismatched attributes could cause a machine-check
on Power.

> I mean in theory I agree that devices should use matching cacheability 
> attributes, but in practice I know of quite a bunch of devices/engines 
> which fails to do this correctly.

Given that the experiences of Ard and I so far has been that the system
ends up making everything cacheable after the RC, perhaps that's an attempt
by system designers to correct for these devices. Unfortunately, it doesn't
help if the CPU carefully goes ahead and establishes a non-cacheable mapping
for itself!

Will

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ