lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <08981771-39ac-af66-e2ec-e8f9bf6aed0a@amd.com>
Date: Wed, 26 Mar 2025 17:30:35 -0500
From: Tom Lendacky <thomas.lendacky@....com>
To: "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
 "Aithal, Srikanth" <sraithal@....com>
Cc: Linux-Next Mailing List <linux-next@...r.kernel.org>,
 open list <linux-kernel@...r.kernel.org>,
 "Roth, Michael" <Michael.Roth@....com>
Subject: Re: linux-next regression: SNP Guest boot hangs with certain cpu/mem
 config combination

On 3/25/25 08:33, Kirill A. Shutemov wrote:
> On Tue, Mar 25, 2025 at 02:40:00PM +0530, Aithal, Srikanth wrote:
>> Hello,
>>
>>
>> Starting linux-next build next-20250312, including recent build 20250324, we
>> are seeing an issue where the SNP guest boot hangs at the "boot smp config"
>> step:
>>
>>
>>  [ 2.294722] smp: Bringing up secondary CPUs ...
>> [    2.295211] smpboot: Parallel CPU startup disabled by the platform
>> [    2.309687] smpboot: x86: Booting SMP configuration:
>> [    2.310214] .... node  #0, CPUs:          #1   #2   #3   #4 #5   #6  
>> #7   #8   #9  #10  #11  #12  #13  #14  #15  #16  #17 #18  #19  #20  #21 
>> #22  #23  #24  #25  #26  #27  #28  #29  #30 #31  #32  #33  #34  #35  #36 
>> #37  #38  #39  #40  #41  #42  #43 #44  #45  #46  #47  #48  #49  #50  #51 
>> #52  #53  #54  #55  #56 #57  #58  #59  #60  #61  #62  #63  #64  #65  #66 
>> #67  #68  #69 #70  #71  #72  #73  #74  #75  #76  #77  #78  #79  #80  #81 
>> #82 #83  #84  #85  #86  #87  #88  #89  #90  #91  #92  #93  #94  #95 #96 
>> #97  #98  #99 #100 #101 #102 #103 #104 #105 #106 #107 #108 #109 #110 #111
>> #112 #113 #114 #115 #116 #117 #118 #119 #120 #121 #122 #123 #124 #125 #126
>> #127 #128 #129 #130 #131 #132 #133 #134 #135 #136 #137 #138 #139 #140 #141
>> #142 #143 #144 #145 #146 #147 #148 #149 #150 #151 #152 #153 #154 #155 #156
>> #157 #158 #159 #160 #161 #162 #163 #164 #165 #166 #167 #168 #169 #170 #171
>> #172 #173 #174 #175 #176 #177 #178 #179 #180 #181 #182 #183 #184 #185 #186
>> #187 #188 #189 #190 #191 #192 #193 #194 #195 #196 #197 #198
>> --> The guest hangs forever at this point.
>>
>>
>> I have observed that certain vCPU and memory combinations work, while others
>> do not. The VM configuration I am using does not have any NUMA nodes.
>>
>> vcpus             Mem        SNP guest boot
>> <=240            19456M    Boots fine
>>> =241,<255   19456M    Hangs
>> 1-255              2048M    Boots fine
>> 1-255              4096M    Boots fine
>>> 71                 8192M    Hangs
>>> 41                 6144M    Hangs
>>
>> When I bisected this issue, it pointed to the following commit :
>>
>>
>> *commit 800f1059c99e2b39899bdc67a7593a7bea6375d8*
>> Author: Kirill A. Shutemov <kirill.shutemov@...ux.intel.com>
>> Date:   Mon Mar 10 10:28:55 2025 +0200
>>
>>     mm/page_alloc: fix memory accept before watermarks gets initialized
> 
> Hm. It is puzzling for me. I don't see how this commit can cause the hang.
> 
> Could you track down where hang happens?

Let me say that the guest config is key for this. Using that config, I
think you might be able to repro this on TDX. The config does turn off TDX
support, so I'm hoping that turning it on doesn't change anything.

I've been able to track it down slightly... It is happening during the CPU
bringup trace points and it eventually gets to line 2273 in
rb_allocate_cpu_buffer() and never comes back from an alloc_pages_node()
call. That's as far as I've gotten so far. I'm not a mm expert so not sure
if I'll be able to progress much further.

Thanks,
Tom

> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ