[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <83cf7fc7-23e0-46f5-916b-5341a0ab9599@amd.com>
Date: Wed, 23 Apr 2025 15:00:17 +0530
From: Bharata B Rao <bharata@....com>
To: Dave Hansen <dave.hansen@...el.com>, linux-kernel@...r.kernel.org,
linux-mm@...ck.org
Cc: Dave Hansen <dave.hansen@...ux.intel.com>, luto@...nel.org,
peterz@...radead.org, tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
x86@...nel.org, hpa@...or.com, nikunj@....com,
Balbir Singh <balbirs@...dia.com>, kees@...nel.org, alexander.deucher@....com
Subject: Re: AMD GPU driver load hitting BUG_ON in sync_global_pgds_l5()
On 22-Apr-25 8:43 PM, Dave Hansen wrote:
> On 4/21/25 23:34, Bharata B Rao wrote:
>> At the outset, it appears that the selection of vmemmap_base doesn't
>> seem to consider if there is going to be enough room of accommodating
>> future hot plugged pages.
>
> Is this future hotplug area in the memory map at boot?
The KVM guest isn't using any -m maxmem option if that's what you are
hinting at.
BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
BIOS-e820: [mem 0x0000000000100000-0x000000007ffdafff] usable
BIOS-e820: [mem 0x000000007ffdb000-0x000000007fffffff] reserved
BIOS-e820: [mem 0x00000000b0000000-0x00000000bfffffff] reserved
BIOS-e820: [mem 0x00000000fed1c000-0x00000000fed1ffff] reserved
BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
BIOS-e820: [mem 0x0000000100000000-0x000000f4a3ffffff] usable
BIOS-e820: [mem 0x000000fd00000000-0x000000ffffffffff] reserved
kaslr_region: base[0] ff4552df80000000 size_tb 1000
kaslr_region: end[0] fffffffffffff
kaslr_region: base[1] ff69c69640000000 size_tb 3200
kaslr_region: base[2] ffd3140680000000 size_tb 40
So vmemmap_base is 0xffd3140680000000
Also the last and max_arch pfns are reported like this:
last_pfn = 0x7ffdb max_arch_pfn = 0x10000000000
Here is some data for the hotplug that happens for the 8 GPUs.
Driver is passing the following values for pgmap->range.start,
pgmap->range.end and pgmap->type in dev_memremap_pages():
amdgpu: kgd2kfd_init_zone_device: start fffc010000000 end fffffffffffff
type 1
amdgpu: kgd2kfd_init_zone_device: start fff8020000000 end fffc00fffffff
type 1
amdgpu: kgd2kfd_init_zone_device: start fff4030000000 end fff801fffffff
type 1
amdgpu: kgd2kfd_init_zone_device: start fff0040000000 end fff402fffffff
type 1
amdgpu: kgd2kfd_init_zone_device: start ffec050000000 end fff003fffffff
type 1
amdgpu: kgd2kfd_init_zone_device: start ffe8060000000 end ffec04fffffff
type 1
amdgpu: kgd2kfd_init_zone_device: start ffe4070000000 end ffe805fffffff
type 1
amdgpu: kgd2kfd_init_zone_device: start ffe0080000000 end ffe406fffffff
type 1
The pfn and the number of pages being added in response to the above:
__add_pages pfn fffc010000 nr_pages 67043328 nid 0
__add_pages pfn fff8020000 nr_pages 67043328 nid 0
__add_pages pfn fff4030000 nr_pages 67043328 nid 0
__add_pages pfn fff0040000 nr_pages 67043328 nid 0
__add_pages pfn ffec050000 nr_pages 67043328 nid 0
__add_pages pfn ffe8060000 nr_pages 67043328 nid 0
__add_pages pfn ffe4070000 nr_pages 67043328 nid 0
__add_pages pfn ffe0080000 nr_pages 67043328 nid 0
For the above vmemmap_base, the (first) addresses seen in
sync_global_pgds_l5() for the above 8 hotplug cases are like this:
start ffd3540580400000, end = ffd35405805fffff
start ffd3540480800000, end = ffd35404809fffff
start ffd3540380c00000, end = ffd3540380dfffff
start ffd3540281000000, end = ffd35402811fffff
start ffd3540181400000, end = ffd35401815fffff
start ffd3540081800000, end = ffd35400819fffff
start ffd353ff81c00000, end = ffd353ff81dfffff
start ffd353fe82000000, end = ffd353fe821fffff
This is for the case that succeeds while I have shown the same data for
the case that fails in the first mail thread.
When randomization results in bad vmemmap_base address, the hotplug of
1st page for the 1st GPU results in BUG_ON.
Regards,
Bharata.
Powered by blists - more mailing lists