lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z90-VOyC5oanCC8z@gmail.com>
Date: Fri, 21 Mar 2025 11:24:20 +0100
From: Ingo Molnar <mingo@...nel.org>
To: Balbir Singh <balbirs@...dia.com>
Cc: Bert Karwatzki <spasswolf@....de>, Alex Deucher <alexdeucher@...il.com>,
	Kees Cook <kees@...nel.org>, Bjorn Helgaas <bhelgaas@...gle.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Andy Lutomirski <luto@...nel.org>, linux-kernel@...r.kernel.org,
	amd-gfx@...ts.freedesktop.org
Subject: Re: commit 7ffb791423c7 breaks steam game


* Balbir Singh <balbirs@...dia.com> wrote:

> On 3/20/25 20:01, Ingo Molnar wrote:
> > 
> > * Balbir Singh <balbirs@...dia.com> wrote:
> > 
> >> On 3/17/25 00:09, Bert Karwatzki wrote:
> >>> This is related to the admgpu.gttsize. My laptop has the maximum amount 
> >>> of memory (64G) and usually gttsize is half of main memory size. I just 
> >>> tested with cmdline="nokaslr amdgpi.gttsize=2048" and the problem does 
> >>> not occur. So I did some more testing with varying gttsize and got this
> >>> for the built-in GPU
> >>>
> >>> 08:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI]
> >>> Cezanne [Radeon Vega Series / Radeon Vega Mobile Series] (rev c5)
> >>>
> >>> (nokaslr is always enabeld)
> >>> gttssize   input behaviour
> >>>  2048		GOOD
> >>>  2064		GOOD
> >>>  2080		SEMIBAD (i.e. noticeable input lag but not as bad as below)
> >>>  3072		BAD
> >>>  4096		BAD
> >>>  8192		BAD
> >>> 16384		BAD
> >>>
> >>> As the build-in GPU has ~512 VRAM there seems to be problems when gttsize >
> >>> 4*VRAM so I tested for the discrete GPU with 8G of VRAM
> >>> gttsize   input behaviour
> >>> 49152		GOOD
> >>> 64000		GOOD
> >>>
> >>> So for the discrete GPU increasing gttsize does no reproduce the bug.
> >>>
> >>
> >> Very interesting, I am not a GTT expert, but with these experiments do you
> >> find anything interesting in
> >>
> >> /sys/kernel/debug/x86/pat_memtype_list?
> >>
> >> It's weird that you don't see any issues in Xorg (Xfce), just the games.
> >> May be we should get help from the amd-gfx experts to further diagnose/debug
> >> the interaction of nokaslr with the game.
> > 
> > So basically your commit:
> > 
> >   7ffb791423c7 ("x86/kaslr: Reduce KASLR entropy on most x86 systems")
> > 
> > inflicts part of the effects of a 'nokaslr' boot command line option, 
> > and triggers the regression due to that?
> > 
> > Or is there some other cause?
> > 
> 
> You are right in your assessment of the root cause. Just to reiterate
>
> - nokaslr does not work with the iGPU, specifically for the games 
>   mentioned
>
> - There is a workaround for the problem, which involves reducing the 
>   amdgpu.gttsize
>
> - The patch exposes the system to nokaslr situation (effect) when 
>   PCI_P2PDMA is enabled

Note that every major x86 distro I checked enables CONFIG_PCI_P2PDMA=y 
and also keeps KASLR enables, so the above qualifiers are immaterial in 
terms of user impact: it's a 100% certainty that distro kernels on 
these systems will regress under these games, right?

What is the importance of the original fix? I should have insisted on a 
fuller changelog, because it's rather thin on details:

  If the BAR address is beyond this limit, PCI peer to peer DMA
  mappings fail.

How frequently does this happen and what is the impact to users if this 
happens?

We might be forced to revert this change if it regresses other systems.

Thanks,

	Ingo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ