[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f25ba003-7644-46ed-a1bc-760231534a1d@amd.com>
Date: Fri, 28 Mar 2025 14:49:27 +0530
From: "Aithal, Srikanth" <sraithal@....com>
To: "Kirill A. Shutemov" <kirill@...temov.name>,
Steven Rostedt <rostedt@...dmis.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
Tom Lendacky <thomas.lendacky@....com>, Jason Baron <jbaron@...mai.com>,
Peter Zijlstra <peterz@...radead.org>, Josh Poimboeuf <jpoimboe@...nel.org>,
Ard Biesheuvel <ardb@...nel.org>,
Linux-Next Mailing List <linux-next@...r.kernel.org>,
open list <linux-kernel@...r.kernel.org>,
"Roth, Michael" <Michael.Roth@....com>
Subject: Re: linux-next regression: SNP Guest boot hangs with certain cpu/mem
config combination
On 3/28/2025 2:39 PM, Kirill A. Shutemov wrote:
> On Fri, Mar 28, 2025 at 10:28:19AM +0200, Kirill A. Shutemov wrote:
>> On Thu, Mar 27, 2025 at 07:39:22PM +0200, Kirill A. Shutemov wrote:
>>> On Thu, Mar 27, 2025 at 11:02:24AM -0400, Steven Rostedt wrote:
>>>> On Thu, 27 Mar 2025 16:43:43 +0200
>>>> "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com> wrote:
>>>>
>>>>>>> The only option I see so far is to drop static branch from this path.
>>>>>>>
>>>>>>> But I am not sure if it the only case were we use static branch from CPU
>>>>>>> hotplug callbacks.
>>>>>>>
>>>>>>> Any other ideas?
>>>>>>
>>>>>>
>>>>>> Hmmm, didn't take too close a look here, but there is the
>>>>>> static_key_slow_dec_cpuslocked() variant, would that work here? Is the issue
>>>>>> the caller may or may not have the cpu_hotplug lock?
>>>>>
>>>>> Yes. This is generic page alloc path and it can be called with and without
>>>>> the lock.
>>>>
>>>> Note, it's not the static_branch that is an issue, it's enabling/disabling
>>>> the static branch that is. Changing a static branch takes a bit of work as
>>>> it does modify the kernel text.
>>>>
>>>> Is it possible to delay the update via a workqueue?
>>>
>>> Ah. Good point. Should work. I'll give it try.
>>
>> The patch below fixes problem for me.
>
> Ah. No, it won't work. We can get there before workqueues are initialized:
> mm_core_init() is called before workqueue_init_early().
>
> We cannot queue a work. :/
>
> Steven, any other ideas?
>
I have booted the guest with different memory and CPU combinations and
have not seen any failures with the fix so far. Are there any other
scenarios that could trigger the above case? Please let me know.
Powered by blists - more mailing lists