[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1e0f2174a72011cb1c78eeecbbf82a4ff108bf8a.camel@icenowy.me>
Date: Wed, 18 Dec 2024 07:44:08 +0800
From: Icenowy Zheng <uwu@...nowy.me>
To: Sui Jingfeng <sui.jingfeng@...ux.dev>, Xi Ruoyao <xry111@...111.site>,
WANG Xuerui <kernel@...0n.name>, Huacai Chen <chenhuacai@...nel.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>, "Mike Rapoport (IBM)"
<rppt@...nel.org>, Baoquan He <bhe@...hat.com>, "Matthew Wilcox (Oracle)"
<willy@...radead.org>, David Hildenbrand <david@...hat.com>, Zhen Lei
<thunder.leizhen@...wei.com>, Thomas Gleixner <tglx@...utronix.de>, Zhihong
Dong <donmor3000@...mail.com>, loongarch@...ts.linux.dev,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] loongarch/mm: disable WUC for pgprot_writecombine as
same as ioremap_wc
在 2024-12-03星期二的 00:23 +0800,Sui Jingfeng写道:
> Hi,
>
> On 10/10/23 20:26, Xi Ruoyao wrote:
> > On Tue, 2023-10-10 at 11:02 +0800, Sui Jingfeng wrote:
> >
> > >
> > > On LoongArch, cached mapping and uncached mappings are DMA-
> > > coherent and guaranteed by
> > > the hardware. While WC mappings is *NOT* DMA-coherent when 3D GPU
> > > is involved. Therefore,
> > > On downstream kernel, We disable write combine(WC) mappings at
> > > the drm drivers side.
> >
> > Why it's only an issue when 3D GPU is involved?
>
> No one saying that only 3D GPU is suffer from this kind of issue,
> I just meant that the issue is there at least for GPU
>
> > What's the difference between 3D GPUs and other devices? Is it
> > possible that the other
> > devices (say neural accelerators) start to perform DMA accesses in
> > a
> > similar way and then suddenly broken?
>
> You, the patch contributor or the maintainer or whatever stuff
> should carry on the test, right?
Well doing some test on PCIe peripherals need some professional tool,
then I assume who raises the idea should do it, because not everyone
can do.
>
> We are not intended to against the patch though.
>
> > > - For buffers at VRAM(device memory), we replace the WC mappings
> > > with uncached mappings.
> > > - For buffers reside in RAM, we replace the WC mappings with
> > > cached mappings.
> > >
> > > By this way, we were able to minimum the side effects, and meet
> > > the usable requirements
> > > for all of the GPU drivers.
> >
> > AFAIK there has been some clear NAK from DRM maintainers towards
> > this
> > "approach". So it's not possible to be applied upstream.
>
> That's your guys problems, stealing other programmer's patch.
> And then, submit it to upstream without knowing and/or presenting
> decent hardware details.
>
>
> > > For DMA non-coherent buffers, we should try to implement arch-
> > > specific dma_map_ops,
> > > invalidate the CPU cache and flush the CPU write buffer before
> > > the device do DMA. Instead
> > > of pretend to be DMA coherent for all buffers, a kernel cmdline
> > > is not a system level
> > > solution for all of GPU drivers and OS release.
> >
> > IIUC this is a hardware bug of 7A1000 and 7A2000, so the proper
> > location
> > of the workaround is in the bridge chip driver. Or am I
> > misunderstanding something?
> >
>
> You are misunderstanding everything and ranting like a dog.
>
> The write buffers are inside the CPU, and the write-combine is
> related
> to *both* the CPU side and the GPU side. The GPU side could choose
> no snooping access mode, while the CPU side have to address such
> request
> properly.
Well I think the radeon driver unconditionally maps VRAM with WC
property, and only map system memory with WC when
drm_arch_can_wc_memory() test passes, this is why blocklisting
LoongArch in drm_arch_can_wc_memory() do not solve all problems; and
for the GPU to access its own memory (VRAM), snooping the CPU sounds
not acceptable.
>
> What's we arguing is that if this is a hardware bug of north bridge,
> we
> at least still should be able to use WC at the CPU side, that is, WC
> on
> system pages should be usable without any issue. While the weird
> commit
> disable everything.
Well, what's the point of using WC on system pages?
Why don't we just use normal cached property? I think non-cached memory
attributes are only there for communication with peripherals, and at
least 3A5000/6000, no meaningful DMA-capable peripheral could be
accessible w/o the bridge chip.
>
>
Powered by blists - more mailing lists