lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <6d46d5b1-a5e8-da72-6a7a-fdf013512875@oracle.com>
Date:   Mon, 10 Jan 2022 09:43:59 -0800
From:   Dongli Zhang <dongli.zhang@...cle.com>
To:     x86@...nel.org
Cc:     tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
        dave.hansen@...ux.intel.com, hpa@...or.com, peterz@...radead.org,
        rafael.j.wysocki@...el.com, vkuznets@...hat.com,
        alison.schofield@...el.com, tim.c.chen@...ux.intel.com,
        boris.ostrovsky@...cle.com, joe.jin@...cle.com,
        longpeng2@...wei.com, bigeasy@...utronix.de,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH RESEND 1/1] x86/smpboot: check cpu_initialized_mask first
 after returning from schedule()

May I have feedback for this patch? This may mitigate a CPU hotplug issue which
is only recoverable across OS reboot, unless the below patch is available.

https://lore.kernel.org/all/20211206152034.2150770-1-bigeasy@linutronix.de/

I see there is a patchset that may rework this part. That patch set does not
change the logic here.

https://lore.kernel.org/all/20211215145633.5238-1-dwmw2@infradead.org/

Thank you very much!

Dongli Zhang

On 12/23/21 1:03 PM, Dongli Zhang wrote:
> To online a new CPU, the master CPU signals the secondary and waits for at
> most 10 seconds until cpu_initialized_mask is set by the secondary CPU.
> There is a case that the master CPU fails the online when it takes 10+
> seconds for schedule() to return (although the cpu_initialized_mask is
> already set by the secondary CPU).
> 
> 1. The master CPU signals the secondary CPU in do_boot_cpu().
> 
> 2. As the cpu_initialized_mask is still not set, the master CPU reschedules
> via schedule().
> 
> 3. Suppose it takes 10+ seconds until schedule() returns due to performance
> issue. The secondary CPU sets the cpu_initialized_mask during those 10+
> seconds.
> 
> 4. Once the schedule() at the master CPU returns, although the
> cpu_initialized_mask is set, the time_before(jiffies, timeout) fails. As a
> result, the master CPU regards this as failure.
> 
> [   51.983296] smpboot: do_boot_cpu failed(-1) to wakeup CPU#4
> 
> 5. Since the secondary CPU is stuck at state CPU_UP_PREPARE, any further
> online operation will fail by cpu_check_up_prepare(), unless the below
> patch set is applied.
> 
> https://lore.kernel.org/all/20211206152034.2150770-1-bigeasy@linutronix.de/
> 
> This issue is resolved by always first checking whether the secondary CPU
> has set cpu_initialized_mask.
> 
> Cc: Longpeng(Mike) <longpeng2@...wei.com>
> Cc: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
> Cc: Joe Jin <joe.jin@...cle.com>
> Signed-off-by: Dongli Zhang <dongli.zhang@...cle.com>
> ---
> To resend by Cc linux-kernel@...r.kernel.org as well.
> 
>  arch/x86/kernel/smpboot.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 617012f4619f..faad4fcf67eb 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -1145,7 +1145,7 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle,
>  		 */
>  		boot_error = -1;
>  		timeout = jiffies + 10*HZ;
> -		while (time_before(jiffies, timeout)) {
> +		while (true) {
>  			if (cpumask_test_cpu(cpu, cpu_initialized_mask)) {
>  				/*
>  				 * Tell AP to proceed with initialization
> @@ -1154,6 +1154,10 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle,
>  				boot_error = 0;
>  				break;
>  			}
> +
> +			if (time_after_eq(jiffies, timeout))
> +				break;
> +
>  			schedule();
>  		}
>  	}
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ