linux-kernel - Re: [PATCH v10 07/11] drm/etnaviv: Add support for the dma coherent device

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9f3c3a81-1aca-a36e-8090-d851ea5ce9f7@loongson.cn>
Date:   Fri, 23 Jun 2023 03:26:22 +0800
From:   Sui Jingfeng <suijingfeng@...ngson.cn>
To:     Lucas Stach <l.stach@...gutronix.de>,
        Sui Jingfeng <18949883232@....com>,
        Russell King <linux+etnaviv@...linux.org.uk>,
        Christian Gmeiner <christian.gmeiner@...il.com>,
        David Airlie <airlied@...il.com>,
        Daniel Vetter <daniel@...ll.ch>
Cc:     linux-kernel@...r.kernel.org, dri-devel@...ts.freedesktop.org,
        etnaviv@...ts.freedesktop.org,
        Philipp Zabel <p.zabel@...gutronix.de>,
        Bjorn Helgaas <bhelgaas@...gle.com>
Subject: Re: [PATCH v10 07/11] drm/etnaviv: Add support for the dma coherent
 device

Hi

On 2023/6/21 23:58, Lucas Stach wrote:
> Am Mittwoch, dem 21.06.2023 um 23:30 +0800 schrieb Sui Jingfeng:
>> Hi,
>>
>> On 2023/6/21 18:00, Lucas Stach wrote:
>>>>    		dma_sync_sgtable_for_cpu(dev->dev, etnaviv_obj->sgt,
>>>>    					 etnaviv_op_to_dma_dir(op));
>>>>    		etnaviv_obj->last_cpu_prep_op = op;
>>>> @@ -408,8 +421,9 @@ int etnaviv_gem_cpu_fini(struct drm_gem_object *obj)
>>>>    {
>>>>    	struct drm_device *dev = obj->dev;
>>>>    	struct etnaviv_gem_object *etnaviv_obj = to_etnaviv_bo(obj);
>>>> +	struct etnaviv_drm_private *priv = dev->dev_private;
>>>>    
>>>> -	if (etnaviv_obj->flags & ETNA_BO_CACHED) {
>>>> +	if (!priv->dma_coherent && etnaviv_obj->flags & ETNA_BO_CACHED) {
>>>>    		/* fini without a prep is almost certainly a userspace error */
>>>>    		WARN_ON(etnaviv_obj->last_cpu_prep_op == 0);
>>>>    		dma_sync_sgtable_for_device(dev->dev, etnaviv_obj->sgt,
>>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c b/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
>>>> index 3524b5811682..754126992264 100644
>>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
>>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
>>>> @@ -112,11 +112,16 @@ static const struct etnaviv_gem_ops etnaviv_gem_prime_ops = {
>>>>    struct drm_gem_object *etnaviv_gem_prime_import_sg_table(struct drm_device *dev,
>>>>    	struct dma_buf_attachment *attach, struct sg_table *sgt)
>>>>    {
>>>> +	struct etnaviv_drm_private *priv = dev->dev_private;
>>>>    	struct etnaviv_gem_object *etnaviv_obj;
>>>>    	size_t size = PAGE_ALIGN(attach->dmabuf->size);
>>>> +	u32 cache_flags = ETNA_BO_WC;
>>>>    	int ret, npages;
>>>>    
>>>> -	ret = etnaviv_gem_new_private(dev, size, ETNA_BO_WC,
>>>> +	if (priv->dma_coherent)
>>>> +		cache_flags = ETNA_BO_CACHED;
>>>> +
>>> Drop this change. Instead etnaviv_gem_new_impl() should do the upgrade
>>> from WC to CACHED as necessary by adding something like this:
>> I understand you are a profession person in vivante GPU driver domain.
>>
>> I respect you reviews and instruction.
>>
>> But, I'm really reluctant to agree with this, is there any space to
>> negotiate?
>>
>>> /*
>>>    * Upgrade WC to CACHED when the device is hardware coherent and the
>>>    * platform doesn't allow mixing cached and writecombined mappings to
>>>    * the same memory area.
>>>    */
>>> if ((flags & ETNA_BO_CACHE_MASK) == ETNA_BO_WC &&
>>>       dev_is_dma_coherent(dev) && !drm_arch_can_wc_memory())
>>>           flags = (flags & ~ETNA_BO_CACHE_MASK) & ETNA_BO_CACHED;
>> This is policy, not a mechanism.
>>
>> Using what cache property is a user-space program's choice.
>>
>> While you are override the WC with CACHED mapping. This is not correct
>> in the concept!
>>
> Please explain why you think that this isn't correct.

This is NOT always correct because:

when etnaviv_gem_prime_import_sg_table() is called,

drm/etnaviv is importing buffer from other drivers (for example drm/loongson or drm/ingenic).

drm/etnaviv is the importer, and drm/loongson or drm/ingenic is the exporter.


While drm/etnaviv using the WC mapping by default, which may cause *cache aliasing*,


Because the imported buffer originally belong to the KMS driver side.

The KMS driver may choose cached mapping by default.


For drm/ingenic(jz4770), the BO will be cached, but their hardware can't guarantee coherency.

they have to flush the cache manually when do the frame buffer update(damage update or dirty update).

Because cached mapping is more fast than the WC mapping.




For drm/loongson, shared buffer have to pinned into GTT,

By default, we using the cached mapping for the buffers in the system RAM;

But we are lucky, our the hardware bless us,

the hardware maintain the cache coherency for us.


While drm/etnaviv choose the WC mapping for the imported buffer blindly,

when doing the import,

*drm/etnaviv know nothing about the original buffer's caching property.*

This is the problem.


Currently, I guess drm/etnaviv only works for drm/imx,

because drm/imx choose WC mapping as the cache property by default for 
the dumb buffer.

The cache property mapping match, so it works.

But their no communications between the KMS driver and RO driver.


I think, drm/etanviv will also wrong on the ARM64 platforms where cache 
coherency is maintained by the hardware.

If the KMS driver using cached mapping, while drm/etnaviv using WC mapping,

There still will have problems.

I will go to find the hardware, and do the testing for you.


> If using WC
> mappings cause a potential loss of coherency on your platform, then we
> can not allow the userspace driver to use WC mappings.

I have explained with my broken English,

The point is that loss of coherency may because KMS-RO framework have 
defect,

maybe software side bug, please don't make the conclusion too fast.


Xorg DDX driver (EXA base DDX for an example) fallback to the CPU do the 
software rendering.

It is even faster than the hardware accelerated implement,

especially in the case where the CPU is fast and the GPU is weak.

LS7A1000(GC1000) + LS3A5000@...Ghz X4 core is such a case.

For Discrete on-board VRAM we will using the WC mapping, it will not be 
shared with other driver,

The CPU use it themself, this is definitionally OK.

In practice, it works like a charm.

For buffers in the system RAM and GTT,

We will using cached mapping for faster CPU software rendering.

hard-coded by the driver.


We are benefited so much from the cached mapping.

While when using write-combine caching mapping for a buffer on LoongArch 
and Loongson Mips CPU,

It is OK for CPU write.

But when you want read from such buffer by the CPU, it is just boil down 
to uncached.

For our platform, uncached read serve as a kind of SYNC.

It will flush the data in the CPU's write buffer.

This cause performance drop.


But we still can use the write-combine caching mapping in the 
circumstances where do not need the CPU read.

So allow write-combine is providing one more choice to the user-space,

It is just that use it at your own risk.


> As I would like to keep the option of WC mappings,

Yeah, I know what's reason.

Because on the platform you love (imx6q, imx8q, for an example),

The hardware don't maintain the cache coherency.

So if fallback to software rendering, you will need to flush the cache frequency.

the performance will drop very much, right?


-- 
Jingfeng