linux-kernel - Re: linux-next regression: SNP Guest boot hangs with certain cpu/mem config combination

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <f25ba003-7644-46ed-a1bc-760231534a1d@amd.com>
Date: Fri, 28 Mar 2025 14:49:27 +0530
From: "Aithal, Srikanth" <sraithal@....com>
To: "Kirill A. Shutemov" <kirill@...temov.name>,
 Steven Rostedt <rostedt@...dmis.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
 Tom Lendacky <thomas.lendacky@....com>, Jason Baron <jbaron@...mai.com>,
 Peter Zijlstra <peterz@...radead.org>, Josh Poimboeuf <jpoimboe@...nel.org>,
 Ard Biesheuvel <ardb@...nel.org>,
 Linux-Next Mailing List <linux-next@...r.kernel.org>,
 open list <linux-kernel@...r.kernel.org>,
 "Roth, Michael" <Michael.Roth@....com>
Subject: Re: linux-next regression: SNP Guest boot hangs with certain cpu/mem
 config combination

On 3/28/2025 2:39 PM, Kirill A. Shutemov wrote:
> On Fri, Mar 28, 2025 at 10:28:19AM +0200, Kirill A. Shutemov wrote:
>> On Thu, Mar 27, 2025 at 07:39:22PM +0200, Kirill A. Shutemov wrote:
>>> On Thu, Mar 27, 2025 at 11:02:24AM -0400, Steven Rostedt wrote:
>>>> On Thu, 27 Mar 2025 16:43:43 +0200
>>>> "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com> wrote:
>>>>
>>>>>>> The only option I see so far is to drop static branch from this path.
>>>>>>>
>>>>>>> But I am not sure if it the only case were we use static branch from CPU
>>>>>>> hotplug callbacks.
>>>>>>>
>>>>>>> Any other ideas?
>>>>>>
>>>>>>
>>>>>> Hmmm, didn't take too close a look here, but there is the
>>>>>> static_key_slow_dec_cpuslocked() variant, would that work here? Is the issue
>>>>>> the caller may or may not have the cpu_hotplug lock?
>>>>>
>>>>> Yes. This is generic page alloc path and it can be called with and without
>>>>> the lock.
>>>>
>>>> Note, it's not the static_branch that is an issue, it's enabling/disabling
>>>> the static branch that is. Changing a static branch takes a bit of work as
>>>> it does modify the kernel text.
>>>>
>>>> Is it possible to delay the update via a workqueue?
>>>
>>> Ah. Good point. Should work. I'll give it try.
>>
>> The patch below fixes problem for me.
> 
> Ah. No, it won't work. We can get there before workqueues are initialized:
> mm_core_init() is called before workqueue_init_early().
> 
> We cannot queue a work. :/
> 
> Steven, any other ideas?
> 

I have booted the guest with different memory and CPU combinations and 
have not seen any failures with the fix so far. Are there any other 
scenarios that could trigger the above case? Please let me know.