lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d87680bab997fdc9fb4e638983132af235d9a03a.camel@web.de>
Date: Wed, 26 Mar 2025 12:14:35 +0100
From: Bert Karwatzki <spasswolf@....de>
To: Balbir Singh <balbirs@...dia.com>, Christian König
	 <christian.koenig@....com>
Cc: Ingo Molnar <mingo@...nel.org>, Kees Cook <kees@...nel.org>, Bjorn
 Helgaas	 <bhelgaas@...gle.com>, Linus Torvalds
 <torvalds@...ux-foundation.org>, Peter Zijlstra <peterz@...radead.org>,
 Andy Lutomirski <luto@...nel.org>, Alex Deucher	
 <alexander.deucher@....com>, linux-kernel@...r.kernel.org, 
	amd-gfx@...ts.freedesktop.org, spasswolf@....de
Subject: Re: commit 7ffb791423c7 breaks steam game

Am Mittwoch, dem 26.03.2025 um 21:36 +1100 schrieb Balbir Singh:
> On 3/26/25 21:10, Bert Karwatzki wrote:
> > Am Mittwoch, dem 26.03.2025 um 12:50 +1100 schrieb Balbir Singh:
> > > On 3/26/25 10:43, Balbir Singh wrote:
> > > > On 3/26/25 10:21, Bert Karwatzki wrote:
> > > > > Am Mittwoch, dem 26.03.2025 um 09:45 +1100 schrieb Balbir Singh:
> > > > > >
> > > > > >
> > > > > > The second region seems to be additional, I suspect that is HMM mapping from kgd2kfd_init_zone_device()
> > > > > >
> > > > > > Balbir Singh
> > > > > >
> > > > > Good guess! I inserted a printk into kgd2kfd_init_zone_device():
> > > > >
> > > > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
> > > > > b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
> > > > > index d05d199b5e44..201220e2ac42 100644
> > > > > --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
> > > > > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
> > > > > @@ -1049,6 +1049,8 @@ int kgd2kfd_init_zone_device(struct amdgpu_device *adev)
> > > > >                 pgmap->range.end = res->end;
> > > > >                 pgmap->type = MEMORY_DEVICE_PRIVATE;
> > > > >         }
> > > > > +       dev_info(adev->dev, "%s: range.start = 0x%llx ranges.end = 0x%llx\n",
> > > > > +                       __func__, pgmap->range.start, pgmap->range.end);
> > > > >
> > > > >         pgmap->nr_range = 1;
> > > > >         pgmap->ops = &svm_migrate_pgmap_ops;
> > > > >
> > > > >
> > > > > and get this in the case without nokaslr:
> > > > >
> > > > > [    T367] amdgpu 0000:03:00.0: kfd_migrate: kgd2kfd_init_zone_device:
> > > > > range.start = 0xafe00000000 ranges.end = 0xaffffffffff
> > > > >
> > > > > and this in the case with nokaslr:
> > > > >
> > > > > [    T365] amdgpu 0000:03:00.0: kfd_migrate: kgd2kfd_init_zone_device:
> > > > > range.start = 0x3ffe00000000 ranges.end = 0x3fffffffffff
> > > > >
> > > >
> > > > So we should ignore the second region then for the purposes of this issue.
> > > >
> > > > I think this now boils down to
> > > >
> > > > Why is the dma_get_required_mask set to all of addressable memory (46 bits)
> > > > when we have nokaslr
> > > >
> > >
> > > I think I know the root cause of the required_mask going up and hence the
> > > use of DMA32
> > >
> > > 1. HMM calls add_pages()
> > > 2. add_pages calls update_end_of_memory_vars()
> > > 3. This updates max_pfn and that causes required_mask to go up to 46 bits
> > >
> > > Do you have CONFIG_HSA_AMD_SVM enabled? Does turning it off, fix the issue?
> > >
> > > The actual issue is the update of max_pfn.
> > >
> > > Balbir Singh
> > >
> >
> > Yes, turning off CONFIG_HSA_AMD_SVM fixes the issue, the strange memory
> > resource 
> > afe00000000-affffffffff : 0000:03:00.0
> > is gone.
> >
> > If one would add a max_pyhs_addr argument to devm_request_free_mem_region()
> > (which return the resource addr in kgd2kfd_init_zone_device()) one could keep
> > the memory below the 44bit limit with CONFIG_HSA_AMD_SVM enabled.
> >
>
> Thanks for reporting the result, does this patch work
>
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index 01ea7c6df303..14f42f8012ab 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -968,8 +968,9 @@ int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
>  	WARN_ON_ONCE(ret);
>
>  	/* update max_pfn, max_low_pfn and high_memory */
> -	update_end_of_memory_vars(start_pfn << PAGE_SHIFT,
> -				  nr_pages << PAGE_SHIFT);
> +	if (!params->pgmap)
> +		update_end_of_memory_vars(start_pfn << PAGE_SHIFT,
> +					  nr_pages << PAGE_SHIFT);
>
>  	return ret;
>  }
>
> It basically prevents max_pfn from moving when the inserted memory is zone_device.
>
> FYI: It's a test patch and will still create issues if the amount of present memory
> (physically) is very high, because the driver need to enable use_dma32 in that case.
>
> If you could try this with everything back to the original config with both kaslr/nokaslr that
> would be very helpful
>
> Thanks,
> Balbir Singh

Yes, this fixes the issue with stellaris and Civilization6. The memory still
shifts as usual in /proc/iomem:
afe00000000-affffffffff : 0000:03:00.0 without nokaslr
3ffe00000000-3fffffffffff : 0000:03:00.0 with nokaslr
but without the change in max_pfn the this has no impact on the required dma
mask.

Bert Karwatzki

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ