[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <953e5e5f-5d47-d6df-40df-c8c94db5447f@amd.com>
Date:   Thu, 24 Jan 2019 09:45:12 +0000
From:   "Koenig, Christian" <Christian.Koenig@....com>
To:     Ard Biesheuvel <ard.biesheuvel@...aro.org>
CC:     Christoph Hellwig <hch@...radead.org>,
        Alex Deucher <alexdeucher@...il.com>,
        Michel Dänzer <michel@...nzer.net>,
        Maxime Ripard <maxime.ripard@...tlin.com>,
        Will Deacon <will.deacon@....com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        amd-gfx list <amd-gfx@...ts.freedesktop.org>,
        David Airlie <airlied@...ux.ie>,
        "Huang, Ray" <Ray.Huang@....com>,
        dri-devel <dri-devel@...ts.freedesktop.org>,
        Michael Ellerman <mpe@...erman.id.au>,
        "Zhang, Jerry" <Jerry.Zhang@....com>,
        "Deucher, Alexander" <Alexander.Deucher@....com>,
        Sean Paul <sean@...rly.run>,
        linux-arm-kernel <linux-arm-kernel@...ts.infradead.org>
Subject: Re: [RFC PATCH] drm: disable WC optimization for cache coherent
 devices on non-x86
Am 24.01.19 um 10:28 schrieb Ard Biesheuvel:
> On Thu, 24 Jan 2019 at 10:25, Koenig, Christian
> <Christian.Koenig@....com> wrote:
>> Am 24.01.19 um 10:13 schrieb Christoph Hellwig:
>>> On Wed, Jan 23, 2019 at 05:52:50PM +0100, Ard Biesheuvel wrote:
>>>> But my concern is that it seems likely that non-cache coherent
>>>> implementations are relying on this hack as well. There must be a
>>>> reason that this hack is only disabled for PowerPC platforms if they
>>>> are cache coherent, for instance, and I suspect that that reason is
>>>> that the hack is the only thing ensuring that the CPU mapping
>>>> attributes match the device ones used for these buffers (the vmap()ed
>>>> ones), whereas the rings and other consistent data structures are
>>>> using the DMA API as intended, and thus getting uncached attributes in
>>>> the correct way.
>>> Dave, who added that commit is on Cc together with just about everyone
>>> involved in the review chain.  Based on the previous explanation
>>> that idea what we might want an uncached mapping for some non-coherent
>>> architectures for this to work at all makes sense, but then again
>>> the way to create those mappings is entirely architecture specific,
>>> and also need a cache flushing before creating the mapping to work
>>> properly.  So my working theory is that this code never properly
>>> worked on architectures without DMA coherent for PCIe at all, but
>>> I'd love to be corrected by concrete examples including an explanation
>>> of how it actually ends up working.
>> Cache coherency is mandatory for modern GPU operation.
>>
>> Otherwise you can't implement a bunch of the requirements of the
>> userspace APIs.
>>
>> In other words the applications doesn't inform the driver that the GPU
>> or the CPU is accessing data, it just does it and assumes that it works.
>>
> Wonderful!
>
> In that case, do you have any objections to the patch proposed by
> Christoph above?
Yeah, the patch of Christoph actually goes way to far cause we have 
reports that this works on a bunch of other architectures.
E.g. X86 64bit, PowerPC (under some conditions) and some MIPS.
The only problematic here actually seems to be ARM, so you should 
probably just add an "#ifdef .._ARM return false;".
Regards,
Christian.
Powered by blists - more mailing lists
 
