linux-kernel - Re: [PATCH v5] cpu/hotplug: Do not bail-out in DYING/STARTING sections

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <42237005-5a6e-9462-bfdd-f1ba0c725c72@leemhuis.info>
Date:   Tue, 20 Sep 2022 11:59:06 +0200
From:   Thorsten Leemhuis <regressions@...mhuis.info>
To:     tglx@...utronix.de
Cc:     linux-kernel@...r.kernel.org, vschneid@...hat.com,
        kernel-team@...roid.com, Derek Dolney <z23@...teo.net>,
        Vincent Donnefort <vdonnefort@...gle.com>, peterz@...radead.org
Subject: Re: [PATCH v5] cpu/hotplug: Do not bail-out in DYING/STARTING
 sections

On 17.08.22 11:46, Thorsten Leemhuis wrote:
> 
> Hi, this is your Linux kernel regression tracker.
> 
> On 25.07.22 11:59, Vincent Donnefort wrote:
>> The DYING/STARTING callbacks are not expected to fail. However, as reported
>> by Derek, drivers such as tboot are still free to return errors within
>> those sections, which halts the hot(un)plug and leaves the CPU in an
>> unrecoverable state.
>>
>> No rollback being possible there, let's only log the failures and proceed
>> with the following steps. This restores the hotplug behaviour prior to
>> commit 453e41085183 ("cpu/hotplug: Add cpuhp_invoke_callback_range()")
>>
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=215867
>> Fixes: 453e41085183 ("cpu/hotplug: Add cpuhp_invoke_callback_range()")
>> Reported-by: Derek Dolney <z23@...teo.net>
>> Signed-off-by: Vincent Donnefort <vdonnefort@...gle.com>
>> Tested-by: Derek Dolney <z23@...teo.net>
> 
> What's the status here? Did that patch to fixing a regression fall
> through the cracks? It looks like nothing happened for 3 weeks now,
> that's why I wondered, but maybe I missed something.

Hmm, Vincent seems to be MIA, at least I see no recent messages from him
on lore. Odd. But well, it's still a fix for a regression and it's up to
v5 already; Valentin already added his Reviewed-by, too. Would be a
shame to waste this.

Thomas, could you maybe take a look at the patch?  Maybe we're lucky and
the patch is already good to go...

Ciao, Thorsten

#regzbot poke

> P.S.: As the Linux kernel's regression tracker I deal with a lot of
> reports and sometimes miss something important when writing mails like
> this. If that's the case here, don't hesitate to tell me in a public
> reply, it's in everyone's interest to set the public record straight.
> 
> 
>> v4 -> v5:
>>    - Remove WARN, only log broken states with pr_warn.
>> v3 -> v4:
>>    - Sorry ... wrong commit description style ...
>> v2 -> v3:
>>    - Tested-by tag.
>>    - Refine commit description.
>>    - Bugzilla link.
>> v1 -> v2:
>>    - Commit message rewording.
>>    - More details in the warnings.
>>    - Some variable renaming
>>
>> diff --git a/kernel/cpu.c b/kernel/cpu.c
>> index bbad5e375d3b..621e5af42d57 100644
>> --- a/kernel/cpu.c
>> +++ b/kernel/cpu.c
>> @@ -663,21 +663,51 @@ static bool cpuhp_next_state(bool bringup,
>>  	return true;
>>  }
>>  
>> -static int cpuhp_invoke_callback_range(bool bringup,
>> -				       unsigned int cpu,
>> -				       struct cpuhp_cpu_state *st,
>> -				       enum cpuhp_state target)
>> +static int __cpuhp_invoke_callback_range(bool bringup,
>> +					 unsigned int cpu,
>> +					 struct cpuhp_cpu_state *st,
>> +					 enum cpuhp_state target,
>> +					 bool nofail)
>>  {
>>  	enum cpuhp_state state;
>> -	int err = 0;
>> +	int ret = 0;
>>  
>>  	while (cpuhp_next_state(bringup, &state, st, target)) {
>> +		int err;
>> +
>>  		err = cpuhp_invoke_callback(cpu, state, bringup, NULL, NULL);
>> -		if (err)
>> +		if (!err)
>> +			continue;
>> +
>> +		if (nofail) {
>> +			pr_warn("CPU %u %s state %s (%d) failed (%d)\n",
>> +				cpu, bringup ? "UP" : "DOWN",
>> +				cpuhp_get_step(st->state)->name,
>> +				st->state, err);
>> +			ret = -1;
>> +		} else {
>> +			ret = err;
>>  			break;
>> +		}
>>  	}
>>  
>> -	return err;
>> +	return ret;
>> +}
>> +
>> +static inline int cpuhp_invoke_callback_range(bool bringup,
>> +					      unsigned int cpu,
>> +					      struct cpuhp_cpu_state *st,
>> +					      enum cpuhp_state target)
>> +{
>> +	return __cpuhp_invoke_callback_range(bringup, cpu, st, target, false);
>> +}
>> +
>> +static inline void cpuhp_invoke_callback_range_nofail(bool bringup,
>> +						      unsigned int cpu,
>> +						      struct cpuhp_cpu_state *st,
>> +						      enum cpuhp_state target)
>> +{
>> +	__cpuhp_invoke_callback_range(bringup, cpu, st, target, true);
>>  }
>>  
>>  static inline bool can_rollback_cpu(struct cpuhp_cpu_state *st)
>> @@ -999,7 +1029,6 @@ static int take_cpu_down(void *_param)
>>  	struct cpuhp_cpu_state *st = this_cpu_ptr(&cpuhp_state);
>>  	enum cpuhp_state target = max((int)st->target, CPUHP_AP_OFFLINE);
>>  	int err, cpu = smp_processor_id();
>> -	int ret;
>>  
>>  	/* Ensure this CPU doesn't handle any more interrupts. */
>>  	err = __cpu_disable();
>> @@ -1012,13 +1041,11 @@ static int take_cpu_down(void *_param)
>>  	 */
>>  	WARN_ON(st->state != (CPUHP_TEARDOWN_CPU - 1));
>>  
>> -	/* Invoke the former CPU_DYING callbacks */
>> -	ret = cpuhp_invoke_callback_range(false, cpu, st, target);
>> -
>>  	/*
>> +	 * Invoke the former CPU_DYING callbacks
>>  	 * DYING must not fail!
>>  	 */
>> -	WARN_ON_ONCE(ret);
>> +	cpuhp_invoke_callback_range_nofail(false, cpu, st, target);
>>  
>>  	/* Give up timekeeping duties */
>>  	tick_handover_do_timer();
>> @@ -1296,16 +1323,14 @@ void notify_cpu_starting(unsigned int cpu)
>>  {
>>  	struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
>>  	enum cpuhp_state target = min((int)st->target, CPUHP_AP_ONLINE);
>> -	int ret;
>>  
>>  	rcu_cpu_starting(cpu);	/* Enables RCU usage on this CPU. */
>>  	cpumask_set_cpu(cpu, &cpus_booted_once_mask);
>> -	ret = cpuhp_invoke_callback_range(true, cpu, st, target);
>>  
>>  	/*
>>  	 * STARTING must not fail!
>>  	 */
>> -	WARN_ON_ONCE(ret);
>> +	cpuhp_invoke_callback_range_nofail(true, cpu, st, target);
>>  }
>>  
>>  /*