[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <688f2757-e364-45db-ad54-daa6ff1c4f3c@nvidia.com>
Date: Sun, 23 Mar 2025 17:51:29 +1100
From: Balbir Singh <balbirs@...dia.com>
To: Bert Karwatzki <spasswolf@....de>
Cc: Ingo Molnar <mingo@...nel.org>, Kees Cook <kees@...nel.org>,
Bjorn Helgaas <bhelgaas@...gle.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Peter Zijlstra <peterz@...radead.org>, Andy Lutomirski <luto@...nel.org>,
Christian König <christian.koenig@....com>,
Alex Deucher <alexander.deucher@....com>, linux-kernel@...r.kernel.org,
amd-gfx@...ts.freedesktop.org
Subject: Re: commit 7ffb791423c7 breaks steam game
On 3/22/25 23:23, Bert Karwatzki wrote:
> The problem occurs in this part of ttm_tt_populate(), in the nokaslr case
> the loop is entered and repeatedly run because ttm_dma32_pages allocated exceeds
> the ttm_dma32_pages_limit which leads to lots of calls to ttm_global_swapout().
>
> if (!strcmp(get_current()->comm, "stellaris"))
> printk(KERN_INFO "%s: ttm_pages_allocated=0x%llx ttm_pages_limit=0x%lx ttm_dma32_pages_allocated=0x%llx ttm_dma32_pages_limit=0x%lx\n",
> __func__, ttm_pages_allocated.counter, ttm_pages_limit, ttm_dma32_pages_allocated.counter, ttm_dma32_pages_limit);
> while (atomic_long_read(&ttm_pages_allocated) > ttm_pages_limit ||
> atomic_long_read(&ttm_dma32_pages_allocated) >
> ttm_dma32_pages_limit) {
>
> if (!strcmp(get_current()->comm, "stellaris"))
> printk(KERN_INFO "%s: count=%d ttm_pages_allocated=0x%llx ttm_pages_limit=0x%lx ttm_dma32_pages_allocated=0x%llx ttm_dma32_pages_limit=0x%lx\n",
> __func__, count++, ttm_pages_allocated.counter, ttm_pages_limit, ttm_dma32_pages_allocated.counter, ttm_dma32_pages_limit);
> ret = ttm_global_swapout(ctx, GFP_KERNEL);
> if (ret == 0)
> break;
> if (ret < 0)
> goto error;
> }
>
> In the case without nokaslr on the number of ttm_dma32_pages_allocated is 0 because
> use_dma32 == false in this case.
>
> So why is use_dma32 enabled with nokaslr? Some more printk()s give this result:
>
> The GPUs:
> built-in:
> 08:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Cezanne [Radeon Vega Series / Radeon Vega Mobile Series] (rev c5)
> discrete:
> 03:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 23 [Radeon RX 6600/6600 XT/6600M] (rev c3)
>
> With nokaslr:
> [ 1.266517] [ T328] dma_addressing_limited: mask = 0xfffffffffff bus_dma_limit = 0x0 required_mask = 0xfffffffff
> [ 1.266519] [ T328] dma_addressing_limited: ops = 0000000000000000 use_dma_iommu(dev) = 0
> [ 1.266520] [ T328] dma_direct_all_ram_mapped: returning true
> [ 1.266521] [ T328] dma_addressing_limited: returning ret = 0
> [ 1.266521] [ T328] amdgpu 0000:03:00.0: amdgpu: amdgpu_ttm_init: calling ttm_device_init() with use_dma32 = 0
> [ 1.266525] [ T328] entering ttm_device_init, use_dma32 = 0
> [ 1.267115] [ T328] entering ttm_pool_init, use_dma32 = 0
>
> [ 3.965669] [ T328] dma_addressing_limited: mask = 0xfffffffffff bus_dma_limit = 0x0 required_mask = 0x3fffffffffff
> [ 3.965671] [ T328] dma_addressing_limited: returning true
> [ 3.965672] [ T328] amdgpu 0000:08:00.0: amdgpu: amdgpu_ttm_init: calling ttm_device_init() with use_dma32 = 1
> [ 3.965674] [ T328] entering ttm_device_init, use_dma32 = 1
> [ 3.965747] [ T328] entering ttm_pool_init, use_dma32 = 1
>
> Without nokaslr:
> [ 1.300907] [ T351] dma_addressing_limited: mask = 0xfffffffffff bus_dma_limit = 0x0 required_mask = 0xfffffffff
> [ 1.300909] [ T351] dma_addressing_limited: ops = 0000000000000000 use_dma_iommu(dev) = 0
> [ 1.300910] [ T351] dma_direct_all_ram_mapped: returning true
> [ 1.300910] [ T351] dma_addressing_limited: returning ret = 0
> [ 1.300911] [ T351] amdgpu 0000:03:00.0: amdgpu: amdgpu_ttm_init: calling ttm_device_init() with use_dma32 = 0
> [ 1.300915] [ T351] entering ttm_device_init, use_dma32 = 0
> [ 1.301210] [ T351] entering ttm_pool_init, use_dma32 = 0
>
> [ 4.000602] [ T351] dma_addressing_limited: mask = 0xfffffffffff bus_dma_limit = 0x0 required_mask = 0xfffffffffff
> [ 4.000603] [ T351] dma_addressing_limited: ops = 0000000000000000 use_dma_iommu(dev) = 0
> [ 4.000604] [ T351] dma_direct_all_ram_mapped: returning true
> [ 4.000605] [ T351] dma_addressing_limited: returning ret = 0
> [ 4.000606] [ T351] amdgpu 0000:08:00.0: amdgpu: amdgpu_ttm_init: calling ttm_device_init() with use_dma32 = 0
> [ 4.000610] [ T351] entering ttm_device_init, use_dma32 = 0
> [ 4.000687] [ T351] entering ttm_pool_init, use_dma32 = 0
>
> So with nokaslr the reuqired mask for the built-in GPU changes from 0xfffffffffff
> to 0x3fffffffffff which causes dma_addressing_limited to return true which causes
> the ttm_device init to be called with use_dma32 = true.
Thanks, this is really the root cause, from what I understand.
> It also show that for the discreate GPU nothing changes so the bug does not occur
> there.
>
> I also was able to work around the bug by calling ttm_device_init() with use_dma32=false
> from amdgpu_ttm_init() (drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c) but I'm not sure if this
> has unwanted side effects.
>
> int amdgpu_ttm_init(struct amdgpu_device *adev)
> {
> uint64_t gtt_size;
> int r;
>
> mutex_init(&adev->mman.gtt_window_lock);
>
> dma_set_max_seg_size(adev->dev, UINT_MAX);
> /* No others user of address space so set it to 0 */
> dev_info(adev->dev, "%s: calling ttm_device_init() with use_dma32 = 0 ignoring %d\n", __func__, dma_addressing_limited(adev->dev));
> r = ttm_device_init(&adev->mman.bdev, &amdgpu_bo_driver, adev->dev,
> adev_to_drm(adev)->anon_inode->i_mapping,
> adev_to_drm(adev)->vma_offset_manager,
> adev->need_swiotlb,
> false /* use_dma32 */);
> if (r) {
> DRM_ERROR("failed initializing buffer object driver(%d).\n", r);
> return r;
> }
>
I think this brings us really close, instead of forcing use_dma32 to false, I wonder if we need something like
uin64_t dma_bits = fls64(dma_get_mask(adev->dev));
to ttm_device_init, pass the last argument (use_dma32) as dma_bits < 32?
Thanks,
Balbir Singh
Powered by blists - more mailing lists