lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <b1d72b95-5b5f-4954-923f-8eebc7909c4d@nvidia.com>
Date: Tue, 25 Mar 2025 09:48:56 +1100
From: Balbir Singh <balbirs@...dia.com>
To: Christian König <christian.koenig@....com>,
 Bert Karwatzki <spasswolf@....de>
Cc: Ingo Molnar <mingo@...nel.org>, Kees Cook <kees@...nel.org>,
 Bjorn Helgaas <bhelgaas@...gle.com>,
 Linus Torvalds <torvalds@...ux-foundation.org>,
 Peter Zijlstra <peterz@...radead.org>, Andy Lutomirski <luto@...nel.org>,
 Alex Deucher <alexander.deucher@....com>, linux-kernel@...r.kernel.org,
 amd-gfx@...ts.freedesktop.org
Subject: Re: commit 7ffb791423c7 breaks steam game

On 3/24/25 23:14, Christian König wrote:
> Am 24.03.25 um 12:23 schrieb Bert Karwatzki:
>> Am Sonntag, dem 23.03.2025 um 17:51 +1100 schrieb Balbir Singh:
>>> On 3/22/25 23:23, Bert Karwatzki wrote:
>>>> ...
>>>> So why is use_dma32 enabled with nokaslr? Some more printk()s give this result:
>>>>
>>>> The GPUs:
>>>> built-in:
>>>> 08:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Cezanne [Radeon Vega Series / Radeon Vega Mobile Series] (rev c5)
>>>> discrete:
>>>> 03:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 23 [Radeon RX 6600/6600 XT/6600M] (rev c3)
>>>>
>>>> With nokaslr:
>>>> [    1.266517] [    T328] dma_addressing_limited: mask = 0xfffffffffff bus_dma_limit = 0x0 required_mask = 0xfffffffff
>>>> [    1.266519] [    T328] dma_addressing_limited: ops = 0000000000000000 use_dma_iommu(dev) = 0
>>>> [    1.266520] [    T328] dma_direct_all_ram_mapped: returning true
>>>> [    1.266521] [    T328] dma_addressing_limited: returning ret = 0
>>>> [    1.266521] [    T328] amdgpu 0000:03:00.0: amdgpu: amdgpu_ttm_init: calling ttm_device_init() with use_dma32 = 0
>>>> [    1.266525] [    T328] entering ttm_device_init, use_dma32 = 0
>>>> [    1.267115] [    T328] entering ttm_pool_init, use_dma32 = 0
>>>>
>>>> [    3.965669] [    T328] dma_addressing_limited: mask = 0xfffffffffff bus_dma_limit = 0x0 required_mask = 0x3fffffffffff
>>>> [    3.965671] [    T328] dma_addressing_limited: returning true
>>>> [    3.965672] [    T328] amdgpu 0000:08:00.0: amdgpu: amdgpu_ttm_init: calling ttm_device_init() with use_dma32 = 1
>>>> [    3.965674] [    T328] entering ttm_device_init, use_dma32 = 1
>>>> [    3.965747] [    T328] entering ttm_pool_init, use_dma32 = 1
>>>>
>>>> Without nokaslr:
>>>> [    1.300907] [    T351] dma_addressing_limited: mask = 0xfffffffffff bus_dma_limit = 0x0 required_mask = 0xfffffffff
>>>> [    1.300909] [    T351] dma_addressing_limited: ops = 0000000000000000 use_dma_iommu(dev) = 0
>>>> [    1.300910] [    T351] dma_direct_all_ram_mapped: returning true
>>>> [    1.300910] [    T351] dma_addressing_limited: returning ret = 0
>>>> [    1.300911] [    T351] amdgpu 0000:03:00.0: amdgpu: amdgpu_ttm_init: calling ttm_device_init() with use_dma32 = 0
>>>> [    1.300915] [    T351] entering ttm_device_init, use_dma32 = 0
>>>> [    1.301210] [    T351] entering ttm_pool_init, use_dma32 = 0
>>>>
>>>> [    4.000602] [    T351] dma_addressing_limited: mask = 0xfffffffffff bus_dma_limit = 0x0 required_mask = 0xfffffffffff
>>>> [    4.000603] [    T351] dma_addressing_limited: ops = 0000000000000000 use_dma_iommu(dev) = 0
>>>> [    4.000604] [    T351] dma_direct_all_ram_mapped: returning true
>>>> [    4.000605] [    T351] dma_addressing_limited: returning ret = 0
>>>> [    4.000606] [    T351] amdgpu 0000:08:00.0: amdgpu: amdgpu_ttm_init: calling ttm_device_init() with use_dma32 = 0
>>>> [    4.000610] [    T351] entering ttm_device_init, use_dma32 = 0
>>>> [    4.000687] [    T351] entering ttm_pool_init, use_dma32 = 0
>>>>
>>>> So with nokaslr the reuqired mask for the built-in GPU changes from 0xfffffffffff
>>>> to 0x3fffffffffff which causes dma_addressing_limited to return true which causes
>>>> the ttm_device init to be called with use_dma32 = true.
>>> Thanks, this is really the root cause, from what I understand.
> 
> Yeah, completely agree.
> 
>>>
>>>>  It also show that for the discreate GPU nothing changes so the bug does not occur
>>>> there.
>>>>
>>>> I also was able to work around the bug by calling ttm_device_init() with use_dma32=false
>>>> from amdgpu_ttm_init()  (drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c) but I'm not sure if this
>>>> has unwanted side effects.
>>>>
>>>> int amdgpu_ttm_init(struct amdgpu_device *adev)
>>>> {
>>>> 	uint64_t gtt_size;
>>>> 	int r;
>>>>
>>>> 	mutex_init(&adev->mman.gtt_window_lock);
>>>>
>>>> 	dma_set_max_seg_size(adev->dev, UINT_MAX);
>>>> 	/* No others user of address space so set it to 0 */
>>>> 	dev_info(adev->dev, "%s: calling ttm_device_init() with use_dma32 = 0 ignoring %d\n", __func__, dma_addressing_limited(adev->dev));
>>>> 	r = ttm_device_init(&adev->mman.bdev, &amdgpu_bo_driver, adev->dev,
>>>> 			       adev_to_drm(adev)->anon_inode->i_mapping,
>>>> 			       adev_to_drm(adev)->vma_offset_manager,
>>>> 			       adev->need_swiotlb,
>>>> 			       false /* use_dma32 */);
>>>> 	if (r) {
>>>> 		DRM_ERROR("failed initializing buffer object driver(%d).\n", r);
>>>> 		return r;
>>>> 	}
>>>>
>>> I think this brings us really close, instead of forcing use_dma32 to false, I wonder if we need something like
>>>
>>> uin64_t dma_bits = fls64(dma_get_mask(adev->dev));
>>>
>>> to ttm_device_init, pass the last argument (use_dma32) as dma_bits < 32?
> 
> The handling is completely correct as far as i can see.
> 
>>>
>>>
>>> Thanks,
>>> Balbir Singh
>>>
>> Do these address bits have to shift when using nokaslr or PCI_P2PDMA, I think
>> this shift cause the increase of the required_dma_mask to 0x3fffffffffff?
>>
>> @@ -104,4 +104,4 @@
>>        fe30300000-fe303fffff : 0000:04:00.0
>>      fe30400000-fe30403fff : 0000:04:00.0
>>      fe30404000-fe30404fff : 0000:04:00.0
>> -afe00000000-affffffffff : 0000:03:00.0
>> +3ffe00000000-3fffffffffff : 0000:03:00.0
>>
>> And what memory is this? It's 8G in size so it could be the RAM of the discrete
>> GPU (which is at PCI 0000:03:00.0), but that is already here (part of
>> /proc/iomem):
>>
>> 1010000000-ffffffffff : PCI Bus 0000:00
>>   fc00000000-fe0fffffff : PCI Bus 0000:01
>>     fc00000000-fe0fffffff : PCI Bus 0000:02
>>       fc00000000-fe0fffffff : PCI Bus 0000:03
>>         fc00000000-fdffffffff : 0000:03:00.0  GPU RAM
>>         fe00000000-fe0fffffff : 0000:03:00.0
>>
>> lspci -v reports 8G of memory at 0xfc00000000 so I assmumed that is the GPU RAM.
>> 03:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 23
>> [Radeon RX 6600/6600 XT/6600M] (rev c3)
>> 	Subsystem: Micro-Star International Co., Ltd. [MSI] Device 1313
>> 	Flags: bus master, fast devsel, latency 0, IRQ 107, IOMMU group 14
>> 	Memory at fc00000000 (64-bit, prefetchable) [size=8G]
>> 	Memory at fe00000000 (64-bit, prefetchable) [size=256M]
>> 	Memory at fca00000 (32-bit, non-prefetchable) [size=1M]
>> 	Expansion ROM at fcb00000 [disabled] [size=128K]
> 
> Well when you set nokaslr then that moves the BAR address of the dGPU above the limit the integrated GPU can access on the bus (usually 40 bits).
> 
> Because of this the integrated GPU starts to fallback to system memory below the 4GB limit to make sure that the stuff is always accessible by everyone.

Why does it fallback to GPU_DMA32? Is the rest of system memory not usable (upto 40 bits)?
I did not realize that the iGPU is using the BAR memory of the dGPU.

I guess the issue goes away when amdgpu.gttsize is set to 2GB, because 2GB fits in the DMA32 window

> 
> Since the memory below 4GB is very very limited we are now starting to constantly swap things in and out of that area. Basically completely killing the performance of your Steam game.
> 
> As far as I can see till that point the handling is completely intentional and working as expected.
> 
> The only thing which eludes me is why setting nokaslr changes the BAR of the dGPU? Can I get the full dmesg with and with nokasl?
> 

IIRC, the iGPU does not work correctly, the dGPU does, so it's an iGPU addressing constraint?

Balbir


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ