linux-kernel - Re: [PATCH v3 0/9] Parallel CPU bringup for x86

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <74d2302f-88fc-c75c-6d2d-4aece1a515bb@molgen.mpg.de>
Date:   Mon, 14 Feb 2022 14:45:49 +0100
From:   Paul Menzel <pmenzel@...gen.mpg.de>
To:     David Woodhouse <dwmw2@...radead.org>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org,
        "H . Peter Anvin" <hpa@...or.com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        "Paul E . McKenney" <paulmck@...nel.org>,
        linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
        rcu@...r.kernel.org, mimoja@...oja.de, hewenliang4@...wei.com,
        hushiyuan@...wei.com, luolongjun@...wei.com, hejingxian@...wei.com
Subject: Re: [PATCH v3 0/9] Parallel CPU bringup for x86_64

Dear David,


Am 29.12.21 um 14:54 schrieb David Woodhouse:
> On Wed, 2021-12-29 at 14:18 +0100, Paul Menzel wrote:
>>> Or the one in
>>> https://lore.kernel.org/lkml/d4cde50b4aab24612823714dfcbe69bc4bb63b60.camel@infradead.org
>>>
>>> which makes it do nothing except prepare all the CPUs before bringing
>>> them up one at a time?
>>
>> I applied it on top the other one, and it made no difference either.
> 
> It's possible I missed something else in the prepare stage that doesn't
> cope with all CPUs being prepared first.
> 
> My next attempt might be to change the loop in bringup_nonboot_cpus()
> to bring all the CPUs not to the CPUHP_BP_PARALLEL_DYN state(s) but
> instead just bring them to somewhere like CPUHP_RCUTREE_PREP, which is
> somewhere in the middle between CPUHP_OFFLINE and CPUHP_BRINGUP_CPU.
> 
> Then a binary chop search — if that one boots, try maybe
> CPUHP_TOPOLOGY_PREPARE. And if not, try CPUHP_PROFILE_PREPARE. Etc.
> 
>>> My current theory (not that I've spent that much time thinking about it
>>> in the last week) is that there's something about the existing CPU
>>> bringup, possibly a CPU bug or something special about the AMD CPUs,
>>> which is triggered by just making it a little bit *faster*, which is
>>> why bringing them up from kexec (especially in qemu) can cause it too?
>>
>> Would having the serial console enabled make a difference?
>
> Yes. I couldn't make this fail in my EC2 m6a instance (for clean boots;
> I have never managed to kexec it) until I turned off the serial console
> to make things go faster.
> 
>>> Tom seemed to find that it was in load_TR_desc(), so if you could try
>>> this hack on a machine that doesn't magically wink out of existence on
>>> a triplefault before even flushing its serial output, that would be
>>> much appreciated...
> 
>> Unfortunately, no more messages were printed on the serial console.
> 
> I suppose we need to litter those outputs somewhere earlier in the
> trampoline then, perhaps it *isn't* getting to load_TR_desc() in your
> case?
> 
> Will be back online properly next week and can actually provide some of
> the above suggestions in patch form if you're willing to keep testing.

Sorry for replying so late. I saw your v4 patches, and tried commit 
5e3524d21d2a () from your branch `parallel-5.17-part1`. Unfortunately, 
the boot problem still persists on an AMD Ryzen 3 2200 g system, I 
tested with. Please tell, where I should report these results too (here 
or posted v4 patches).

Also, do you have (physical) access to a system with an AMD CPU? If not, 
maybe we can get you one, so it’s more convenient for you to test.


Kind regards,

Paul