[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e95d97c98caa525b04cf6383a74db9cadf694afb.camel@icenowy.me>
Date: Tue, 10 Oct 2023 08:50:21 +0800
From: Icenowy Zheng <uwu@...nowy.me>
To: Sui Jingfeng <suijingfeng@...ngson.cn>,
Huacai Chen <chenhuacai@...nel.org>,
WANG Xuerui <kernel@...0n.name>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Weihao Li <liweihao@...ngson.cn>,
"Mike Rapoport (IBM)" <rppt@...nel.org>,
Jun Yi <yijun@...ngson.cn>, Baoquan He <bhe@...hat.com>,
"Matthew Wilcox (Oracle)" <willy@...radead.org>,
David Hildenbrand <david@...hat.com>,
Hongchen Zhang <zhanghongchen@...ngson.cn>,
Binbin Zhou <zhoubinbin@...ngson.cn>,
Zhen Lei <thunder.leizhen@...wei.com>,
Tiezhu Yang <yangtiezhu@...ngson.cn>,
Thomas Gleixner <tglx@...utronix.de>,
Zhihong Dong <donmor3000@...mail.com>,
loongarch@...ts.linux.dev, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] loongarch/mm: disable WUC for pgprot_writecombine as
same as ioremap_wc
在 2023-10-09星期一的 22:32 +0800,Sui Jingfeng写道:
> Hi,
>
>
> On 2023/10/9 12:28, Icenowy Zheng wrote:
> > Currently the code disables WUC only disables it for ioremap_wc(),
> > which
> > is only used when mapping writecombine pages like ioremap() (mapped
> > to
> > the kernel space). For VRAM mapped in TTM/GEM, it's mapped with a
> > crafted pgprot with pgprot_writecombine() function, which isn't
> > corrently disabled now.
> >
> > Disable WUC for pgprot_writecombine() (fallback to SUC) too.
> >
> > This improves AMDGPU driver stability on Loongson 3A5000 machines.
> >
> > Signed-off-by: Icenowy Zheng <uwu@...nowy.me>
> > ---
> > Changes since v1:
> > - Removed _WC macros
> > - Mention ioremap_wc in commit message
> >
> > arch/loongarch/include/asm/io.h | 5 ++---
> > arch/loongarch/include/asm/pgtable-bits.h | 4 +++-
> > arch/loongarch/kernel/setup.c | 10 +++++-----
> > 3 files changed, 10 insertions(+), 9 deletions(-)
> >
> > diff --git a/arch/loongarch/include/asm/io.h
> > b/arch/loongarch/include/asm/io.h
> > index 0dcb36b32cb25..290aad87a8847 100644
> > --- a/arch/loongarch/include/asm/io.h
> > +++ b/arch/loongarch/include/asm/io.h
> > @@ -52,10 +52,9 @@ static inline void __iomem
> > *ioremap_prot(phys_addr_t offset, unsigned long size,
> > * @offset: bus address of the memory
> > * @size: size of the resource to map
> > */
> > -extern pgprot_t pgprot_wc;
> > -
> > #define ioremap_wc(offset, size) \
> > - ioremap_prot((offset), (size), pgprot_val(pgprot_wc))
> > + ioremap_prot((offset), (size), pgprot_val( \
> > + wc_enabled ? PAGE_KERNEL_WUC : PAGE_KERNEL_SUC))
> >
> > #define ioremap_cache(offset, size) \
> > ioremap_prot((offset), (size), pgprot_val(PAGE_KERNEL))
> > diff --git a/arch/loongarch/include/asm/pgtable-bits.h
> > b/arch/loongarch/include/asm/pgtable-bits.h
> > index 35348d4c4209a..a3d827701736d 100644
> > --- a/arch/loongarch/include/asm/pgtable-bits.h
> > +++ b/arch/loongarch/include/asm/pgtable-bits.h
> > @@ -92,6 +92,8 @@
> >
> > #ifndef __ASSEMBLY__
> >
> > +extern bool wc_enabled;
> > +
> > #define _PAGE_IOREMAP pgprot_val(PAGE_KERNEL_SUC)
> >
> > #define pgprot_noncached pgprot_noncached
> > @@ -111,7 +113,7 @@ static inline pgprot_t
> > pgprot_writecombine(pgprot_t _prot)
> > {
> > unsigned long prot = pgprot_val(_prot);
> >
> > - prot = (prot & ~_CACHE_MASK) | _CACHE_WUC;
> > + prot = (prot & ~_CACHE_MASK) | (wc_enabled ? _CACHE_WUC :
> > _CACHE_SUC);
> >
> > return __pgprot(prot);
> > }
> > diff --git a/arch/loongarch/kernel/setup.c
> > b/arch/loongarch/kernel/setup.c
> > index 7783f0a3d742c..465c1dbb6f4b4 100644
> > --- a/arch/loongarch/kernel/setup.c
> > +++ b/arch/loongarch/kernel/setup.c
> > @@ -161,19 +161,19 @@ static void __init smbios_parse(void)
> > }
> >
> > #ifdef CONFIG_ARCH_WRITECOMBINE
> > -pgprot_t pgprot_wc = PAGE_KERNEL_WUC;
> > +bool wc_enabled = true;
> > #else
> > -pgprot_t pgprot_wc = PAGE_KERNEL_SUC;
> > +bool wc_enabled;
> > #endif
> >
> > -EXPORT_SYMBOL(pgprot_wc);
> > +EXPORT_SYMBOL(wc_enabled);
> >
> > static int __init setup_writecombine(char *p)
> > {
> > if (!strcmp(p, "on"))
> > - pgprot_wc = PAGE_KERNEL_WUC;
> > + wc_enabled = true;
> > else if (!strcmp(p, "off"))
> > - pgprot_wc = PAGE_KERNEL_SUC;
> > + wc_enabled = false;
> > else
> > pr_warn("Unknown writecombine setting \"%s\".\n",
> > p);
> >
>
>
> Good catch!
>
> But this will make the write combine(WC) mappings completely unusable
> on LoongArch.
Not completely unusable -- we always have the writecombine=on kernel
parameter that overrides it (or CONFIG_ARCH_WRITECOMBINE build-time
option).
People knowing what they're doing can utilize them to gain back the
performace of WUC, however for the default setup I prefer functionality
correctness to (unstable) performance.
> This is nearly equivalent to say that LoongArch don't support write
> combine at all.
> But the write combine(WC) mappings works fine for software based drm
> drivers,
> such as drm/loongson and drm/ast etc. Even include drm/radeon and
> drm/amdgpu with
> pure software rendering setup (by putting Option "Accel" "off" into
> 10-amdgpu.conf
> or 10-radeon.conf) After merge this patch, the performance drop
> dramatically for
> 2D software rendering based display controller drivers.
>
> Well, this patch itself is a good catch, as it is a fix for the
> commit <16c52e503043>
> ("LoongArch: Make WriteCombine configurable for ioremap()"). But I'm
> afraid that
> both of this commit and the <16c52e503043> commit are not a *real*
> fix write combine
Yes, this is a workaround rather than a fix, however I'm a software
developer and this is the best I can do.
If you can, please report this to your hardware developers and try to
get it fixed for 3A7000/7A3000 (or maybe grabbing some chicken bit that
will fix this for existing chips). I will be quite appreciated.
Thanks,
Icenowy
> related issue on LoongArch. It just negative sidestep of the real
> problem.
>
Powered by blists - more mailing lists