lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <l343rk4s6mrppjl3vxxvnwc52wlovg6bojduiy3qf5zup5ifzx@7qu7jilrsgn5>
Date:   Wed, 7 Jun 2023 14:19:29 +0200
From:   Maxime Ripard <mripard@...nel.org>
To:     Sui Jingfeng <suijingfeng@...ngson.cn>
Cc:     Paul Cercueil <paul@...pouillou.net>,
        Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
        Thomas Zimmermann <tzimmermann@...e.de>,
        David Airlie <airlied@...il.com>,
        Daniel Vetter <daniel@...ll.ch>,
        Laurent Pinchart <laurent.pinchart@...asonboard.com>,
        Kieran Bingham <kieran.bingham+renesas@...asonboard.com>,
        dri-devel@...ts.freedesktop.org, linux-kernel@...r.kernel.org,
        linux-mips@...r.kernel.org, linux-renesas-soc@...r.kernel.org,
        loongson-kernel@...ts.loongnix.cn
Subject: Re: [PATCH] drm: gem: add an option for supporting the dma-coherent
 hardware.

On Wed, Jun 07, 2023 at 06:30:01PM +0800, Sui Jingfeng wrote:
> On 2023/6/7 17:36, Paul Cercueil wrote:
> > Hi Sui,
> > 
> > Le mercredi 07 juin 2023 à 13:30 +0800, Sui Jingfeng a écrit :
> > > The single map_noncoherent member of struct drm_gem_dma_object may
> > > not
> > > sufficient for describing the backing memory of the GEM buffer
> > > object.
> > > 
> > > Especially on dma-coherent systems, the backing memory is both cached
> > > coherent for multi-core CPUs and dma-coherent for peripheral device.
> > > Say architectures like X86-64, LoongArch64, Loongson Mips64, etc.
> > > 
> > > Whether a peripheral device is dma-coherent or not can be
> > > implementation-dependent. The single map_noncoherent option is not
> > > enough
> > > to reflect real hardware anymore. For example, the Loongson LS3A4000
> > > CPU
> > > and LS2K2000/LS2K1000 SoC, peripheral device of such hardware
> > > platform
> > > allways snoop CPU's cache. Doing the allocation with
> > > dma_alloc_coherent
> > > function is preferred. The return buffer is cached, it should not
> > > using
> > > the default write-combine mapping. While with the current implement,
> > > there
> > > no way to tell the drm core to reflect this.
> > > 
> > > This patch adds cached and coherent members to struct
> > > drm_gem_dma_object.
> > > which allow driver implements to inform the core. Introducing new
> > > mappings
> > > while keeping the original default behavior unchanged.
> > Did you try to simply set the "dma-coherent" property to the device's
> > node?
> 
> But this approach can only be applied for the device driver with DT support.
>
> X86-64, Loongson ls3a4000 mips64, Loongson ls3a5000 CPU typically do not
> have DT support.
> 
> They using ACPI to pass parameter from the firmware to Linux kernel.
> 
> You approach will lost the effectiveness on such a case.

Not really, no. All DT support is doing is setting some generic device
parameter based on that property, but the infrastructure is very much
generic and can be used on systems without a DT.

> >  From what I understand if you add that property then Linux will use DMA
> > coherent memory even though you use dma_alloc_noncoherent() and the
> > sync_single_for_cpu() / sync_single_for_device() are then NOPs.
 >
> Please do not mitigate the problems with confusing method.

It's not a confusing method, it's one of the two main API to deal with
DMA buffers. And you might disagree with Paul but there's no need to be
rude about it.

> This approach not only tend to generate confusion but also
> implement-dependent and arch-dependent. It's definitely problematic.
> 
> 
> How does the dma_alloc_coherent/dma_alloc_noncoherent is a ARCH specific
> thing.
> 
> Dependent on how does the arch_dma_ops is implemented.
> 
> 
> The definition of the coherent on different ARCH has different meanings.
> 
> The definition of the wirte-combine on different ARCH has different
> meanings.
> 
> 
> The wirte-combine(uncache acceleration) on mips is non dma-coherent.

Then MIPS breaks the DMA allocation semantics. A buffer allocated with
dma_alloc_wc is supposed to be coherent.

> But on arm, It seem that wirte-combine is coherent. (guaranteed by arch
> implement).
> 
> 
> I also heard using dma_alloc_coherent  to allocation the buffer for the
> non-coherent doesn't hurt, but the reverse is not true.
> 
> 
> But please do not create confusion.
> 
> software composite is faster because better cacheusing rate and
> 
> cache is faster to read.
> 
> It is faster because it is cached, not because it is non-coherent.
> 
> non-coherent is arch thing and/or driver-side thing,
> 
> it is a side effect of  using the cached mapping.

Honestly, it's not clear to me what your point or issue is.

Going back to the description in your commit log, you mention that you
want to support multiple hardware that might or might not be DMA
coherent, and thus you want to allocate a buffer with different
attributes depending on that?

Like, you say that the LS3A4000 has a coherency unit and thus doing the
allocation with dma_alloc_coherent is preferred. Preferred to what? A WC
buffer? Why?

A WC buffer is a coherent buffer that is allowed to cache writes.

It doesn't have to, and worst case scenario you're inexactly the same
case than a dma_alloc_coherent buffer.

> It should left to driver to handle such a side effect. The device
> driver know their device, so its the device driver's responsibility to
> maintain the coherency.

Not really, no. Some driver are used across multiple SoCs and multiple
arch. It doesn't make any sense to encode this in the driver... which is
why it's in the DT in the first place, and abstracted away by the DMA
API. Like, do you really expect the amdgpu driver to know the DMA
attributes it needs to allocate a buffer from when running from a
RaspberryPi?

> On loongson platform, we don't need to call
> drm_fb_dma_sync_non_coherent() function, Its already guaranteed by
> hardware.

And mostly guaranteed by dma_alloc_coherent. And if you wanted to call
it anyway, it would be a nop if the device is declared as coherent
already.

I think you're thinking about this backward. A buffer has mapping
attributes, and a device has hardware properties.

The driver (ie, software) will allocate a buffer with some mapping
attributes, and will assume that they are met in the rest of its code.
How they are met is an implementation detail of the hardware, and for
all the driver cares, it doesn't have to match.

You can allocate a WC buffer to use on a non-coherent device and that's
fine. You can allocate a non-coherent buffer on a coherent device and
that's fine too. The DMA API will make everything work when it needs to,
and if the hardware already provides stronger guarantees, then it will
just skip whatever is redundant.

So you need to write your driver using buffer is the most convenient for
you, and it's really all that matters at the driver level. But for that
to work, you need to flag the coherence-ness of your devices properly,
like Paul suggested.

Maxime

Download attachment "signature.asc" of type "application/pgp-signature" (229 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ