[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z90-VOyC5oanCC8z@gmail.com>
Date: Fri, 21 Mar 2025 11:24:20 +0100
From: Ingo Molnar <mingo@...nel.org>
To: Balbir Singh <balbirs@...dia.com>
Cc: Bert Karwatzki <spasswolf@....de>, Alex Deucher <alexdeucher@...il.com>,
Kees Cook <kees@...nel.org>, Bjorn Helgaas <bhelgaas@...gle.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Peter Zijlstra <peterz@...radead.org>,
Andy Lutomirski <luto@...nel.org>, linux-kernel@...r.kernel.org,
amd-gfx@...ts.freedesktop.org
Subject: Re: commit 7ffb791423c7 breaks steam game
* Balbir Singh <balbirs@...dia.com> wrote:
> On 3/20/25 20:01, Ingo Molnar wrote:
> >
> > * Balbir Singh <balbirs@...dia.com> wrote:
> >
> >> On 3/17/25 00:09, Bert Karwatzki wrote:
> >>> This is related to the admgpu.gttsize. My laptop has the maximum amount
> >>> of memory (64G) and usually gttsize is half of main memory size. I just
> >>> tested with cmdline="nokaslr amdgpi.gttsize=2048" and the problem does
> >>> not occur. So I did some more testing with varying gttsize and got this
> >>> for the built-in GPU
> >>>
> >>> 08:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI]
> >>> Cezanne [Radeon Vega Series / Radeon Vega Mobile Series] (rev c5)
> >>>
> >>> (nokaslr is always enabeld)
> >>> gttssize input behaviour
> >>> 2048 GOOD
> >>> 2064 GOOD
> >>> 2080 SEMIBAD (i.e. noticeable input lag but not as bad as below)
> >>> 3072 BAD
> >>> 4096 BAD
> >>> 8192 BAD
> >>> 16384 BAD
> >>>
> >>> As the build-in GPU has ~512 VRAM there seems to be problems when gttsize >
> >>> 4*VRAM so I tested for the discrete GPU with 8G of VRAM
> >>> gttsize input behaviour
> >>> 49152 GOOD
> >>> 64000 GOOD
> >>>
> >>> So for the discrete GPU increasing gttsize does no reproduce the bug.
> >>>
> >>
> >> Very interesting, I am not a GTT expert, but with these experiments do you
> >> find anything interesting in
> >>
> >> /sys/kernel/debug/x86/pat_memtype_list?
> >>
> >> It's weird that you don't see any issues in Xorg (Xfce), just the games.
> >> May be we should get help from the amd-gfx experts to further diagnose/debug
> >> the interaction of nokaslr with the game.
> >
> > So basically your commit:
> >
> > 7ffb791423c7 ("x86/kaslr: Reduce KASLR entropy on most x86 systems")
> >
> > inflicts part of the effects of a 'nokaslr' boot command line option,
> > and triggers the regression due to that?
> >
> > Or is there some other cause?
> >
>
> You are right in your assessment of the root cause. Just to reiterate
>
> - nokaslr does not work with the iGPU, specifically for the games
> mentioned
>
> - There is a workaround for the problem, which involves reducing the
> amdgpu.gttsize
>
> - The patch exposes the system to nokaslr situation (effect) when
> PCI_P2PDMA is enabled
Note that every major x86 distro I checked enables CONFIG_PCI_P2PDMA=y
and also keeps KASLR enables, so the above qualifiers are immaterial in
terms of user impact: it's a 100% certainty that distro kernels on
these systems will regress under these games, right?
What is the importance of the original fix? I should have insisted on a
fuller changelog, because it's rather thin on details:
If the BAR address is beyond this limit, PCI peer to peer DMA
mappings fail.
How frequently does this happen and what is the impact to users if this
happens?
We might be forced to revert this change if it regresses other systems.
Thanks,
Ingo
Powered by blists - more mailing lists