[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <08981771-39ac-af66-e2ec-e8f9bf6aed0a@amd.com>
Date: Wed, 26 Mar 2025 17:30:35 -0500
From: Tom Lendacky <thomas.lendacky@....com>
To: "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
"Aithal, Srikanth" <sraithal@....com>
Cc: Linux-Next Mailing List <linux-next@...r.kernel.org>,
open list <linux-kernel@...r.kernel.org>,
"Roth, Michael" <Michael.Roth@....com>
Subject: Re: linux-next regression: SNP Guest boot hangs with certain cpu/mem
config combination
On 3/25/25 08:33, Kirill A. Shutemov wrote:
> On Tue, Mar 25, 2025 at 02:40:00PM +0530, Aithal, Srikanth wrote:
>> Hello,
>>
>>
>> Starting linux-next build next-20250312, including recent build 20250324, we
>> are seeing an issue where the SNP guest boot hangs at the "boot smp config"
>> step:
>>
>>
>> [ 2.294722] smp: Bringing up secondary CPUs ...
>> [ 2.295211] smpboot: Parallel CPU startup disabled by the platform
>> [ 2.309687] smpboot: x86: Booting SMP configuration:
>> [ 2.310214] .... node #0, CPUs: #1 #2 #3 #4 #5 #6
>> #7 #8 #9 #10 #11 #12 #13 #14 #15 #16 #17 #18 #19 #20 #21
>> #22 #23 #24 #25 #26 #27 #28 #29 #30 #31 #32 #33 #34 #35 #36
>> #37 #38 #39 #40 #41 #42 #43 #44 #45 #46 #47 #48 #49 #50 #51
>> #52 #53 #54 #55 #56 #57 #58 #59 #60 #61 #62 #63 #64 #65 #66
>> #67 #68 #69 #70 #71 #72 #73 #74 #75 #76 #77 #78 #79 #80 #81
>> #82 #83 #84 #85 #86 #87 #88 #89 #90 #91 #92 #93 #94 #95 #96
>> #97 #98 #99 #100 #101 #102 #103 #104 #105 #106 #107 #108 #109 #110 #111
>> #112 #113 #114 #115 #116 #117 #118 #119 #120 #121 #122 #123 #124 #125 #126
>> #127 #128 #129 #130 #131 #132 #133 #134 #135 #136 #137 #138 #139 #140 #141
>> #142 #143 #144 #145 #146 #147 #148 #149 #150 #151 #152 #153 #154 #155 #156
>> #157 #158 #159 #160 #161 #162 #163 #164 #165 #166 #167 #168 #169 #170 #171
>> #172 #173 #174 #175 #176 #177 #178 #179 #180 #181 #182 #183 #184 #185 #186
>> #187 #188 #189 #190 #191 #192 #193 #194 #195 #196 #197 #198
>> --> The guest hangs forever at this point.
>>
>>
>> I have observed that certain vCPU and memory combinations work, while others
>> do not. The VM configuration I am using does not have any NUMA nodes.
>>
>> vcpus Mem SNP guest boot
>> <=240 19456M Boots fine
>>> =241,<255 19456M Hangs
>> 1-255 2048M Boots fine
>> 1-255 4096M Boots fine
>>> 71 8192M Hangs
>>> 41 6144M Hangs
>>
>> When I bisected this issue, it pointed to the following commit :
>>
>>
>> *commit 800f1059c99e2b39899bdc67a7593a7bea6375d8*
>> Author: Kirill A. Shutemov <kirill.shutemov@...ux.intel.com>
>> Date: Mon Mar 10 10:28:55 2025 +0200
>>
>> mm/page_alloc: fix memory accept before watermarks gets initialized
>
> Hm. It is puzzling for me. I don't see how this commit can cause the hang.
>
> Could you track down where hang happens?
Let me say that the guest config is key for this. Using that config, I
think you might be able to repro this on TDX. The config does turn off TDX
support, so I'm hoping that turning it on doesn't change anything.
I've been able to track it down slightly... It is happening during the CPU
bringup trace points and it eventually gets to line 2273 in
rb_allocate_cpu_buffer() and never comes back from an alloc_pages_node()
call. That's as far as I've gotten so far. I'm not a mm expert so not sure
if I'll be able to progress much further.
Thanks,
Tom
>
Powered by blists - more mailing lists