[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z-UuHkUPy60e1GWM@gmail.com>
Date: Thu, 27 Mar 2025 11:53:18 +0100
From: Ingo Molnar <mingo@...nel.org>
To: Balbir Singh <balbirs@...dia.com>
Cc: Bert Karwatzki <spasswolf@....de>,
Christian König <christian.koenig@....com>,
Kees Cook <kees@...nel.org>, Bjorn Helgaas <bhelgaas@...gle.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Peter Zijlstra <peterz@...radead.org>,
Andy Lutomirski <luto@...nel.org>,
Alex Deucher <alexander.deucher@....com>,
linux-kernel@...r.kernel.org, amd-gfx@...ts.freedesktop.org
Subject: Re: commit 7ffb791423c7 breaks steam game
* Balbir Singh <balbirs@...dia.com> wrote:
> > Yes, turning off CONFIG_HSA_AMD_SVM fixes the issue, the strange memory
> > resource
> > afe00000000-affffffffff : 0000:03:00.0
> > is gone.
> >
> > If one would add a max_pyhs_addr argument to devm_request_free_mem_region()
> > (which return the resource addr in kgd2kfd_init_zone_device()) one could keep
> > the memory below the 44bit limit with CONFIG_HSA_AMD_SVM enabled.
> >
>
> Thanks for reporting the result, does this patch work
>
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index 01ea7c6df303..14f42f8012ab 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -968,8 +968,9 @@ int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
> WARN_ON_ONCE(ret);
>
> /* update max_pfn, max_low_pfn and high_memory */
> - update_end_of_memory_vars(start_pfn << PAGE_SHIFT,
> - nr_pages << PAGE_SHIFT);
> + if (!params->pgmap)
> + update_end_of_memory_vars(start_pfn << PAGE_SHIFT,
> + nr_pages << PAGE_SHIFT);
>
> return ret;
> }
>
> It basically prevents max_pfn from moving when the inserted memory is
> zone_device.
>
> FYI: It's a test patch and will still create issues if the amount of
> present memory (physically) is very high, because the driver need to
> enable use_dma32 in that case.
So this patch does the trick for Bert, and I'm wondering what the best
fix here would be overall, because it's a tricky situation.
Am I correct in assuming that with enough physical memory this bug
would trigger, with and without nokaslr?
I *think* the best approach going forward would be to add the above
quirk the the x86 memory setup code, but also issue a kernel warning at
that point with all the relevant information included, so that the
driver's use_dma32 bug can at least be indicated?
That might also trigger for other systems, because if this scenario is
so spurious, I doubt it's the only affected driver ...
Thanks,
Ingo
Powered by blists - more mailing lists