[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <0d9ce519-7b87-48e4-a374-33ae0df8f730@pavinjoseph.com>
Date: Mon, 1 Apr 2024 19:00:42 +0530
From: Pavin Joseph <me@...injoseph.com>
To: "Eric W. Biederman" <ebiederm@...ssion.com>,
Ingo Molnar <mingo@...nel.org>
Cc: Steve Wahl <steve.wahl@....com>, Dave Hansen
<dave.hansen@...ux.intel.com>, Andy Lutomirski <luto@...nel.org>,
Peter Zijlstra <peterz@...radead.org>, Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
x86@...nel.org, "H. Peter Anvin" <hpa@...or.com>,
linux-kernel@...r.kernel.org,
Linux regressions mailing list <regressions@...ts.linux.dev>,
stable@...r.kernel.org, Eric Hagberg <ehagberg@...il.com>,
Simon Horman <horms@...ge.net.au>, Dave Young <dyoung@...hat.com>,
Sarah Brofeldt <srhb@....dk>, Russ Anderson <rja@....com>,
Dimitri Sivanich <sivanich@....com>,
Hou Wenlong <houwenlong.hwl@...group.com>,
Andrew Morton <akpm@...ux-foundation.org>, Baoquan He <bhe@...hat.com>,
Yuntao Wang <ytcoode@...il.com>, Bjorn Helgaas <bhelgaas@...gle.com>
Subject: Re: [PATCH v4] x86/mm/ident_map: On UV systems, use gbpages only
where full GB page should be mapped.
Hi Eric,
Here's the output of /proc/iomem:
suse-laptop:~ # cat /proc/iomem
00000000-00000fff : Reserved
00001000-0009221f : System RAM
00092220-0009229f : System RAM
000922a0-0009828f : System RAM
00098290-0009829f : System RAM
000982a0-0009efff : System RAM
0009f000-0009ffff : Reserved
000e0000-000fffff : Reserved
000a0000-000effff : PCI Bus 0000:00
000f0000-000fffff : System ROM
00100000-09bfffff : System RAM
06200000-071fffff : Kernel code
07200000-07e6dfff : Kernel rodata
08000000-082e3eff : Kernel data
08ba8000-08ffffff : Kernel bss
09c00000-09d90fff : Reserved
09d91000-09efffff : System RAM
09f00000-09f0efff : ACPI Non-volatile Storage
09f0f000-bf5a2017 : System RAM
ba000000-be7fffff : Crash kernel
bf5a2018-bf5af857 : System RAM
bf5af858-c3a60fff : System RAM
c3a61000-c3b54fff : Reserved
c3b55000-c443dfff : System RAM
c443e000-c443efff : Reserved
c443f000-c51adfff : System RAM
c51ae000-c51aefff : Reserved
c51af000-c747dfff : System RAM
c747e000-cb67dfff : Reserved
cb669000-cb66cfff : MSFT0101:00
cb669000-cb66cfff : MSFT0101:00
cb66d000-cb670fff : MSFT0101:00
cb66d000-cb670fff : MSFT0101:00
cb67e000-cd77dfff : ACPI Non-volatile Storage
cd77e000-cd7fdfff : ACPI Tables
cd7fe000-ce7fffff : System RAM
ce800000-cfffffff : Reserved
d0000000-f7ffffff : PCI Bus 0000:00
f8000000-fbffffff : PCI ECAM 0000 [bus 00-3f]
f8000000-fbffffff : Reserved
f8000000-fbffffff : pnp 00:00
fc000000-fdffffff : PCI Bus 0000:00
fd000000-fd0fffff : PCI Bus 0000:05
fd000000-fd0007ff : 0000:05:00.1
fd000000-fd0007ff : ahci
fd001000-fd0017ff : 0000:05:00.0
fd001000-fd0017ff : ahci
fd100000-fd4fffff : PCI Bus 0000:04
fd100000-fd1fffff : 0000:04:00.3
fd100000-fd1fffff : xhci-hcd
fd200000-fd2fffff : 0000:04:00.4
fd200000-fd2fffff : xhci-hcd
fd300000-fd3fffff : 0000:04:00.2
fd300000-fd3fffff : ccp
fd400000-fd47ffff : 0000:04:00.0
fd480000-fd4bffff : 0000:04:00.5
fd4c0000-fd4c7fff : 0000:04:00.6
fd4c0000-fd4c7fff : ICH HD audio
fd4c8000-fd4cbfff : 0000:04:00.1
fd4c8000-fd4cbfff : ICH HD audio
fd4cc000-fd4cdfff : 0000:04:00.2
fd4cc000-fd4cdfff : ccp
fd500000-fd5fffff : PCI Bus 0000:03
fd500000-fd503fff : 0000:03:00.0
fd500000-fd503fff : nvme
fd600000-fd6fffff : PCI Bus 0000:02
fd600000-fd60ffff : 0000:02:00.0
fd600000-fd60ffff : rtw88_pci
fd700000-fd7fffff : PCI Bus 0000:01
fd700000-fd703fff : 0000:01:00.0
fd704000-fd704fff : 0000:01:00.0
fd704000-fd704fff : r8169
fde10510-fde1053f : MSFT0101:00
fdf00000-fdf7ffff : amd_iommu
feb00000-feb00007 : SB800 TCO
fec00000-fec003ff : IOAPIC 0
fec01000-fec013ff : IOAPIC 1
fec10000-fec1001f : pnp 00:04
fed00000-fed003ff : HPET 2
fed00000-fed003ff : PNP0103:00
fed00000-fed003ff : pnp 00:04
fed61000-fed613ff : pnp 00:04
fed80000-fed80fff : Reserved
fed80000-fed80fff : pnp 00:04
fed81200-fed812ff : AMDI0030:00
fed81500-fed818ff : AMDI0030:00
fed81500-fed818ff : AMDI0030:00 AMDI0030:00
fedc2000-fedc2fff : AMDI0010:00
fedc2000-fedc2fff : AMDI0010:00 AMDI0010:00
fedc3000-fedc3fff : AMDI0010:01
fedc3000-fedc3fff : AMDI0010:01 AMDI0010:01
fedc4000-fedc4fff : AMDI0010:02
fedc4000-fedc4fff : AMDI0010:02 AMDI0010:02
fee00000-fee00fff : pnp 00:00
ff000000-ffffffff : pnp 00:04
100000000-3af37ffff : System RAM
399000000-3ae4fffff : Crash kernel
3af380000-42fffffff : Reserved
430000000-ffffffffff : PCI Bus 0000:00
460000000-4701fffff : PCI Bus 0000:04
460000000-46fffffff : 0000:04:00.0
470000000-4701fffff : 0000:04:00.0
3fff80000000-3fffffffffff : 0000:04:00.0
Thanks for creating kexec btw, it's invaluable for systems with slow
firmware and loader 🚀
Pavin.
On 3/31/24 09:25, Eric W. Biederman wrote:
> Ingo Molnar <mingo@...nel.org> writes:
>
>> * Pavin Joseph <me@...injoseph.com> wrote:
>>
>>> On 3/29/24 13:45, Ingo Molnar wrote:
>>>> Just to clarify, we have the following 3 upstream (and soon to be upstream) versions:
>>>>
>>>> v1: pre-d794734c9bbf kernels
>>>> v2: d794734c9bbf x86/mm/ident_map: Use gbpages only where full GB page should be mapped.
>>>> v3: c567f2948f57 Revert "x86/mm/ident_map: Use gbpages only where full GB page should be mapped."
>>>>
>>>> Where v1 and v3 ought to be the same in behavior.
>>>>
>>>> So how does the failure matrix look like on your systems? Is my
>>>> understanding accurate:
>>
>>> Slight correction:
>>>
>>> regular boot | regular kexec | nogbpages boot | nogbpages kexec boot
>>> -----------------|---------------|----------------|------------------
>>> v1: OK | OK | OK | FAIL
>>> v2: OK | FAIL | OK | FAIL
>>
>> Thanks!
>>
>> So the question is now: does anyone have a theory about in what fashion
>> the kexec nogbpages bootup differs from the regular nogbpages bootup to
>> break on your system?
>>
>> I'd have expected the described root cause of the firmware not properly
>> enumerating all memory areas that need to be mapped to cause trouble on
>> regular, non-kexec nogbpages bootups too. What makes the kexec bootup
>> special to trigger this crash?
>
> My blind hunch would be something in the first 1MiB being different.
> The first 1MiB is where all of the historical stuff is and where
> I have seen historical memory maps be less than perfectly accurate.
>
> Changing what is mapped being the difference between success and failure
> sounds like some place that is dark and hard to debug a page fault is
> being triggered and that in turn becoming a triple fault.
>
> Paving Joseph is there any chance you can provide your memory map?
> Perhaps just cat /proc/iomem?
>
> If I have something to go one other than works/doesn't work I can
> probably say something intelligent.
>
> Eric
Powered by blists - more mailing lists