lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Mon, 4 Jul 2022 12:01:08 +0200
From:   Thorsten Leemhuis <regressions@...mhuis.info>
To:     Vincent Donnefort <vdonnefort@...gle.com>,
        Thomas Gleixner <tglx@...utronix.de>
Cc:     peterz@...radead.org, linux-kernel@...r.kernel.org,
        vschneid@...hat.com, kernel-team@...roid.com,
        Derek Dolney <z23@...teo.net>
Subject: Re: [PATCH v2] cpu/hotplug: Do not bail-out in DYING/STARTING
 sections

On 13.06.22 15:37, Vincent Donnefort wrote:
> On Mon, Jun 13, 2022 at 02:36:18PM +0200, Thomas Gleixner wrote:
>> Vincent,
>>
>> On Mon, May 23 2022 at 17:05, Vincent Donnefort wrote:
>>> +static int _cpuhp_invoke_callback_range(bool bringup,
>>> +					unsigned int cpu,
>>> +					struct cpuhp_cpu_state *st,
>>> +					enum cpuhp_state target,
>>> +					bool nofail)
>>>  {
>>>  	enum cpuhp_state state;
>>> -	int err = 0;
>>> +	int ret = 0;
>>>  
>>>  	while (cpuhp_next_state(bringup, &state, st, target)) {
>>> +		int err;
>>> +
>>>  		err = cpuhp_invoke_callback(cpu, state, bringup, NULL, NULL);
>>> -		if (err)
>>> +		if (!err)
>>> +			continue;
>>> +
>>> +		if (nofail) {
>>> +			pr_warn("CPU %u %s state %s (%d) failed (%d)\n",
>>> +				cpu, bringup ? "UP" : "DOWN",
>>> +				cpuhp_get_step(st->state)->name,
>>> +				st->state, err);
>>> +			ret = -1;
>>
>> I have a hard time to map this to the changelog:
>>
>>> those sections. In that case, there's nothing the hotplug machinery can do,
>>> so let's just proceed and log the failures.
>>
>> That's still returning an error code at the end. Confused.
> 
> It is, but after returning from this function, only a warning will be raised
> (cpuhp_invoke_callback_range_nofail()) instead of stopping the HP machinery
> (cpuhp_invoke_callback_range()). How about this changelog?
> 
>   The DYING/STARTING callbacks are not expected to fail. However, as reported by
>   Derek, drivers such as tboot are still free to return errors within those
>   sections, which halts the hot(un)plug and leaves the CPU in an unrecoverable
>   state.
>   
>   No rollback being possible there, let's only log the failures and proceed
>   with the following steps. This restores the hotplug behaviour prior to
>   453e41085183 (cpu/hotplug: Add cpuhp_invoke_callback_range())

Vincent, what's up here? Did that patch make it further? It looks to me
like things stalled here, but maybe I'm missing something. I'm asking
because that fix was supposed to fix a regression I'm tracking.

BTW, if you respin this patch, could you please add proper 'Link:' tags
pointing to all reports about this issue? e.g. like this:

 Link: https://bugzilla.kernel.org/show_bug.cgi?id=215867

These tags are important, as they allow others to look into the
backstory now and years from now. That is why they should be placed in
cases like this, as Documentation/process/submitting-patches.rst and
Documentation/process/5.Posting.rst explain in more detail.
Additionally, my regression tracking bot ‘regzbot’ relies on these tags
to automatically connect reports with patches that are posted or
committed to fix the reported issue.

Ciao, Thorsten

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ