lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKv+Gu-5V-UjP_YzZBCEsa+o_G6BRSVw2ZimYGNEfRGf-aRPNg@mail.gmail.com>
Date:   Mon, 21 Jan 2019 17:30:00 +0100
From:   Ard Biesheuvel <ard.biesheuvel@...aro.org>
To:     Christoph Hellwig <hch@...radead.org>
Cc:     linux-arm-kernel <linux-arm-kernel@...ts.infradead.org>,
        dri-devel <dri-devel@...ts.freedesktop.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        amd-gfx@...ts.freedesktop.org,
        Christian Koenig <christian.koenig@....com>,
        Alex Deucher <alexander.deucher@....com>,
        David Zhou <David1.Zhou@....com>,
        Huang Rui <ray.huang@....com>,
        Junwei Zhang <Jerry.Zhang@....com>,
        Michel Daenzer <michel.daenzer@....com>,
        David Airlie <airlied@...ux.ie>,
        Daniel Vetter <daniel@...ll.ch>,
        Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
        Maxime Ripard <maxime.ripard@...tlin.com>,
        Sean Paul <sean@...rly.run>,
        Michael Ellerman <mpe@...erman.id.au>,
        Benjamin Herrenschmidt <benh@...nel.crashing.org>,
        Will Deacon <will.deacon@....com>
Subject: Re: [RFC PATCH] drm: disable WC optimization for cache coherent
 devices on non-x86

On Mon, 21 Jan 2019 at 17:22, Christoph Hellwig <hch@...radead.org> wrote:
>
> On Mon, Jan 21, 2019 at 05:14:37PM +0100, Ard Biesheuvel wrote:
> > > I'll add big fat comments.  But the fact that nothing is exported
> > > there should be a fairly big hint.
> > >
> >
> > I don't follow. How do other header files 'export' things in a way
> > that this header doesn't?
>
> Well, I'll add comments to make it more obvious..
>
> > As far as I can tell, these drivers allocate DMA'able memory [in
> > ttm_tt_populate()] and subsequently create their own CPU mappings for
> > it, assuming that
> > a) the default is cache coherent, so vmap()ing those pages with
> > cacheable attributes works, and
>
> Yikes.  vmaping with different attributes is generally prone to
> create problems on a lot of architectures.
>

Indeed. But if your starting point is the assumption that DMA is
always cache coherent, those vmap() attributes are never different.

> > b) telling the GPU to use NoSnoop attributes makes the accesses it
> > performs coherent with non-cacheable CPU mappings of those physical
> > pages
> >
> > Since the latter is not true for many arm64 systems, I need this patch
> > to get a working system.
>
> Do we know that this actually works anywhere but x86?
>

In theory, it could work on arm64 systems with stage2-only SMMUs and
correctly configured PCIe RCs that set the right AMBA attributes for
inbound transactions with the NoSnoop attributes set.

Unfortunately, it seems that the current SMMU ARM code will clobber
those AMBA attributes when it uses stage1 mappings, since it forces
the memory attributes to WBWA for cache coherent devices.

So, as I pointed out in the commit log, the main difference between
x86 and other arches it that it can easily tolerate when NoSnoop is
non-functional.

> In general I would call these above sequence rather bogus and would
> prefer we could get rid of such antipatterns in the kernel and just use
> dma_alloc_attrs with DMA_ATTR_WRITECOMBINE if we want writecombine
> semantics.
>

Agreed.

> Until that happens we should just change the driver ifdefs to default
> the hacks to off and only enable them on setups where we 100%
> positively know that they actually work.  And document that fact
> in big fat comments.

Well, as I mentioned in my commit log as well, if we default to off
unless CONFIG_X86, we may break working setups on MIPS and Power where
the device is in fact non-cache coherent, and relies on this
'optimization' to get things working. The same could be true for
non-coherent ARM systems, hence my approach to disable this hack for
cache coherent devices on non-X86 only.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ