[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a9f37e3b-2192-42d2-8d5d-c38c0d3fe509@nvidia.com>
Date: Wed, 26 Mar 2025 12:50:11 +1100
From: Balbir Singh <balbirs@...dia.com>
To: Bert Karwatzki <spasswolf@....de>,
Christian König <christian.koenig@....com>
Cc: Ingo Molnar <mingo@...nel.org>, Kees Cook <kees@...nel.org>,
Bjorn Helgaas <bhelgaas@...gle.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Peter Zijlstra <peterz@...radead.org>, Andy Lutomirski <luto@...nel.org>,
Alex Deucher <alexander.deucher@....com>, linux-kernel@...r.kernel.org,
amd-gfx@...ts.freedesktop.org
Subject: Re: commit 7ffb791423c7 breaks steam game
On 3/26/25 10:43, Balbir Singh wrote:
> On 3/26/25 10:21, Bert Karwatzki wrote:
>> Am Mittwoch, dem 26.03.2025 um 09:45 +1100 schrieb Balbir Singh:
>>>
>>>
>>> The second region seems to be additional, I suspect that is HMM mapping from kgd2kfd_init_zone_device()
>>>
>>> Balbir Singh
>>>
>> Good guess! I inserted a printk into kgd2kfd_init_zone_device():
>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
>> b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
>> index d05d199b5e44..201220e2ac42 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
>> @@ -1049,6 +1049,8 @@ int kgd2kfd_init_zone_device(struct amdgpu_device *adev)
>> pgmap->range.end = res->end;
>> pgmap->type = MEMORY_DEVICE_PRIVATE;
>> }
>> + dev_info(adev->dev, "%s: range.start = 0x%llx ranges.end = 0x%llx\n",
>> + __func__, pgmap->range.start, pgmap->range.end);
>>
>> pgmap->nr_range = 1;
>> pgmap->ops = &svm_migrate_pgmap_ops;
>>
>>
>> and get this in the case without nokaslr:
>>
>> [ T367] amdgpu 0000:03:00.0: kfd_migrate: kgd2kfd_init_zone_device:
>> range.start = 0xafe00000000 ranges.end = 0xaffffffffff
>>
>> and this in the case with nokaslr:
>>
>> [ T365] amdgpu 0000:03:00.0: kfd_migrate: kgd2kfd_init_zone_device:
>> range.start = 0x3ffe00000000 ranges.end = 0x3fffffffffff
>>
>
> So we should ignore the second region then for the purposes of this issue.
>
> I think this now boils down to
>
> Why is the dma_get_required_mask set to all of addressable memory (46 bits)
> when we have nokaslr
>
I think I know the root cause of the required_mask going up and hence the
use of DMA32
1. HMM calls add_pages()
2. add_pages calls update_end_of_memory_vars()
3. This updates max_pfn and that causes required_mask to go up to 46 bits
Do you have CONFIG_HSA_AMD_SVM enabled? Does turning it off, fix the issue?
The actual issue is the update of max_pfn.
Balbir Singh
Powered by blists - more mailing lists