[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <abb7dc61-75e8-3e40-f449-37e7bb835bbf@loongson.cn>
Date: Sun, 25 Jun 2023 12:04:13 +0800
From: Sui Jingfeng <suijingfeng@...ngson.cn>
To: Lucas Stach <l.stach@...gutronix.de>,
Sui Jingfeng <18949883232@....com>,
Russell King <linux+etnaviv@...linux.org.uk>,
Christian Gmeiner <christian.gmeiner@...il.com>,
David Airlie <airlied@...il.com>,
Daniel Vetter <daniel@...ll.ch>
Cc: linux-kernel@...r.kernel.org, dri-devel@...ts.freedesktop.org,
etnaviv@...ts.freedesktop.org,
Philipp Zabel <p.zabel@...gutronix.de>,
Bjorn Helgaas <bhelgaas@...gle.com>
Subject: Re: [PATCH v10 07/11] drm/etnaviv: Add support for the dma coherent
device
Hi,
On 2023/6/22 01:53, Lucas Stach wrote:
> Am Donnerstag, dem 22.06.2023 um 01:31 +0800 schrieb Sui Jingfeng:
>> Hi,
>>
>> On 2023/6/22 00:07, Lucas Stach wrote:
>>> And as the HW guarantees it on your platform, your platform
>>> implementation makes this function effectively a no-op. Skipping the
>>> call to this function is breaking the DMA API abstraction, as now the
>>> driver is second guessing the DMA API implementation. I really see no
>>> reason to do this.
>> It is the same reason you chose the word 'effectively', not 'difinitely'.
>>
>> We don't want waste the CPU's time,
>>
>>
>> to running the dma_sync_sg_for_cpu funcion() function
>>
>>
>> ```
>>
>> void dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
>> int nelems, enum dma_data_direction dir)
>> {
>> const struct dma_map_ops *ops = get_dma_ops(dev);
>>
>> BUG_ON(!valid_dma_direction(dir));
>> if (dma_map_direct(dev, ops))
>> dma_direct_sync_sg_for_cpu(dev, sg, nelems, dir);
>> else if (ops->sync_sg_for_cpu)
>> ops->sync_sg_for_cpu(dev, sg, nelems, dir);
>> debug_dma_sync_sg_for_cpu(dev, sg, nelems, dir);
>> }
>>
>> ```
>>
>>
>> to running the this:
>>
>>
>> ```
>>
>> int etnaviv_gem_cpu_fini(struct drm_gem_object *obj)
>> {
>> struct drm_device *dev = obj->dev;
>> struct etnaviv_gem_object *etnaviv_obj = to_etnaviv_bo(obj);
>> struct etnaviv_drm_private *priv = dev->dev_private;
>>
>> if (!priv->dma_coherent && etnaviv_obj->flags & ETNA_BO_CACHED) {
>> /* fini without a prep is almost certainly a userspace error */
>> WARN_ON(etnaviv_obj->last_cpu_prep_op == 0);
>> dma_sync_sgtable_for_device(dev->dev, etnaviv_obj->sgt,
>> etnaviv_op_to_dma_dir(etnaviv_obj->last_cpu_prep_op));
>> etnaviv_obj->last_cpu_prep_op = 0;
>> }
>>
>> return 0;
>> }
>>
>> ```
>>
> My judgment as the maintainer of this driver is that the small CPU
> overhead of calling this function is very well worth it, if the
> alternative is breaking the DMA API abstractions.
>
>> But, this is acceptable, because we can kill the GEM_CPU_PREP and
>> GEM_CPU_FINI ioctl entirely
>>
>> at userspace for cached buffer, as this is totally not needed for cached
>> mapping on our platform.
>>
> And that statement isn't true either.
Yes, you are right here. I admit.
Because I have suffered such problem in the past when developing
xf86-video-loongson.
The root cause, I think, is the CPU don't know when the GPU have
finished the rendering.
Or there still some data reside in the GPU's cache.
We have to call etna_bo_cpu_prep(etna_bo, DRM_ETNA_PREP_READ) function
to make sure the data fetch by CPU is the latest.
I realized this knowledge(issue) five month ago in this year, see [1]
for reference.
I just forget this thing when doing the debate with you.
[1]
https://gitlab.freedesktop.org/longxin2019/xf86-video-loongson/-/commit/95f9596eb19223c3109ea1f32c3e086fd1d43bd8
||
> The CPU_PREP/FINI ioctls also
> provide fence synchronization between CPU and GPU.
You are correct here.
> There are a few very
> specific cases where skipping those ioctls is acceptable (mostly when
> the userspace driver explicitly wants unsynchronized access), but in
> most cases they are required for correctness.
OK, you are extremely correct.
> Regards,
> Lucas
--
Jingfeng
Powered by blists - more mailing lists