lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <745f219e-1593-4fbd-fa7f-1719ef6f444d@siemens.com>
Date:   Tue, 27 Jul 2021 10:46:06 +0200
From:   Jan Kiszka <jan.kiszka@...mens.com>
To:     Henning Schild <henning.schild@...mens.com>,
        Thomas Gleixner <tglx@...utronix.de>
Cc:     Peter Zijlstra <peterz@...radead.org>, x86@...nel.org,
        linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...hat.com>,
        Borislav Petkov <bp@...en8.de>,
        Guenter Roeck <linux@...ck-us.net>, xenomai@...omai.org
Subject: Re: sched: Unexpected reschedule of offline CPU#2!

[Henning, don't top-post ;)]

On 27.07.21 10:00, Henning Schild via Xenomai wrote:
> Was this ever resolved and if so can someone please point me to the
> patches? I started digging a bit but could not yet find how that
> continued.
> 
> I am seeing similar or maybe the same problem on 4.19.192 with the
> ipipe patch from the xenomai project applied.
> 

Before blaming the usual suspects, I have a general ordering question on 
mainline below.

> regards,
> Henning
> 
> Am Sat, 17 Aug 2019 22:21:48 +0200
> schrieb Thomas Gleixner <tglx@...utronix.de>:
> 
>> On Fri, 16 Aug 2019, Guenter Roeck wrote:
>>> On Fri, Aug 16, 2019 at 12:22:22PM +0200, Thomas Gleixner wrote:  
>>>> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
>>>> index 75fea0d48c0e..625627b1457c 100644
>>>> --- a/arch/x86/kernel/process.c
>>>> +++ b/arch/x86/kernel/process.c
>>>> @@ -601,6 +601,7 @@ void stop_this_cpu(void *dummy)
>>>>  	/*
>>>>  	 * Remove this CPU:
>>>>  	 */
>>>> +	set_cpu_active(smp_processor_id(), false);
>>>>  	set_cpu_online(smp_processor_id(), false);
>>>>  	disable_local_APIC();
>>>>  	mcheck_cpu_clear(this_cpu_ptr(&cpu_info));
>>>>   
>>> No luck. The problem is still seen with this patch applied on top of
>>> the mainline kernel (commit a69e90512d9def6).  
>>
>> Yeah, was a bit too naive ....
>>
>> We actually need to do the full cpuhotplug dance for a regular
>> reboot. In the panic case, there is nothing we can do about. I'll
>> have a look tomorrow.
>>

What is supposed to prevent the following in mainline:

CPU 0                   CPU 1                      CPU 2

native_stop_other_cpus                             <INTERRUPT>
  send_IPI_allbutself                              ...
                        <INTERRUPT>
                        sysvec_reboot
                          stop_this_cpu
                            set_cpu_online(false)
                                                   native_smp_send_reschedule(1)
                                                     if (cpu_is_offline(1)) ...

Jan

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ