linux-kernel - Re: [PATCH v10 07/11] drm/etnaviv: Add support for the dma coherent device

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <abb7dc61-75e8-3e40-f449-37e7bb835bbf@loongson.cn>
Date:   Sun, 25 Jun 2023 12:04:13 +0800
From:   Sui Jingfeng <suijingfeng@...ngson.cn>
To:     Lucas Stach <l.stach@...gutronix.de>,
        Sui Jingfeng <18949883232@....com>,
        Russell King <linux+etnaviv@...linux.org.uk>,
        Christian Gmeiner <christian.gmeiner@...il.com>,
        David Airlie <airlied@...il.com>,
        Daniel Vetter <daniel@...ll.ch>
Cc:     linux-kernel@...r.kernel.org, dri-devel@...ts.freedesktop.org,
        etnaviv@...ts.freedesktop.org,
        Philipp Zabel <p.zabel@...gutronix.de>,
        Bjorn Helgaas <bhelgaas@...gle.com>
Subject: Re: [PATCH v10 07/11] drm/etnaviv: Add support for the dma coherent
 device

Hi,

On 2023/6/22 01:53, Lucas Stach wrote:
> Am Donnerstag, dem 22.06.2023 um 01:31 +0800 schrieb Sui Jingfeng:
>> Hi,
>>
>> On 2023/6/22 00:07, Lucas Stach wrote:
>>> And as the HW guarantees it on your platform, your platform
>>> implementation makes this function effectively a no-op. Skipping the
>>> call to this function is breaking the DMA API abstraction, as now the
>>> driver is second guessing the DMA API implementation. I really see no
>>> reason to do this.
>> It is the same reason you chose the word 'effectively', not 'difinitely'.
>>
>> We don't want waste the CPU's time,
>>
>>
>>    to running the dma_sync_sg_for_cpu funcion() function
>>
>>
>> ```
>>
>> void dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
>>               int nelems, enum dma_data_direction dir)
>> {
>>       const struct dma_map_ops *ops = get_dma_ops(dev);
>>
>>       BUG_ON(!valid_dma_direction(dir));
>>       if (dma_map_direct(dev, ops))
>>           dma_direct_sync_sg_for_cpu(dev, sg, nelems, dir);
>>       else if (ops->sync_sg_for_cpu)
>>           ops->sync_sg_for_cpu(dev, sg, nelems, dir);
>>       debug_dma_sync_sg_for_cpu(dev, sg, nelems, dir);
>> }
>>
>> ```
>>
>>
>>    to running the this:
>>
>>
>> ```
>>
>> int etnaviv_gem_cpu_fini(struct drm_gem_object *obj)
>> {
>>       struct drm_device *dev = obj->dev;
>>       struct etnaviv_gem_object *etnaviv_obj = to_etnaviv_bo(obj);
>>       struct etnaviv_drm_private *priv = dev->dev_private;
>>
>>       if (!priv->dma_coherent && etnaviv_obj->flags & ETNA_BO_CACHED) {
>>           /* fini without a prep is almost certainly a userspace error */
>>           WARN_ON(etnaviv_obj->last_cpu_prep_op == 0);
>>           dma_sync_sgtable_for_device(dev->dev, etnaviv_obj->sgt,
>> etnaviv_op_to_dma_dir(etnaviv_obj->last_cpu_prep_op));
>>           etnaviv_obj->last_cpu_prep_op = 0;
>>       }
>>
>>       return 0;
>> }
>>
>> ```
>>
> My judgment as the maintainer of this driver is that the small CPU
> overhead of calling this function is very well worth it, if the
> alternative is breaking the DMA API abstractions.
>
>> But, this is acceptable, because we can kill the GEM_CPU_PREP and
>> GEM_CPU_FINI ioctl entirely
>>
>> at userspace for cached buffer, as this is totally not needed for cached
>> mapping on our platform.
>>
> And that statement isn't true either.

Yes, you are right here. I admit.


Because I have suffered such problem in the past when developing 
xf86-video-loongson.

The root cause, I think,  is the CPU don't know when the GPU have 
finished the rendering.

Or there still some data reside in the GPU's cache.


We have to call etna_bo_cpu_prep(etna_bo, DRM_ETNA_PREP_READ) function

to make sure the  data fetch by CPU is the latest.


I realized this knowledge(issue) five month ago in this year, see [1] 
for reference.

I  just forget this thing when doing the debate with you.


[1] 
https://gitlab.freedesktop.org/longxin2019/xf86-video-loongson/-/commit/95f9596eb19223c3109ea1f32c3e086fd1d43bd8

||


>   The CPU_PREP/FINI ioctls also
> provide fence synchronization between CPU and GPU.

You are correct here.

> There are a few very
> specific cases where skipping those ioctls is acceptable (mostly when
> the userspace driver explicitly wants unsynchronized access), but in
> most cases they are required for correctness.

OK, you are extremely correct.

> Regards,
> Lucas

-- 
Jingfeng