lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <AHoA7gBvDieBN0w9ILASHaqg.3.1616503055590.Hmail.wangqing@vivo.com>
Date:   Tue, 23 Mar 2021 20:37:35 +0800 (GMT+08:00)
From:   王擎 <wangqing@...o.com>
To:     Petr Mladek <pmladek@...e.com>, Tejun Heo <tj@...nel.org>
Cc:     Lai Jiangshan <jiangshanlai@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        "Guilherme G. Piccoli" <gpiccoli@...onical.com>,
        Andrey Ignatov <rdna@...com>, Vlastimil Babka <vbabka@...e.cz>,
        Santosh Sivaraj <santosh@...six.org>,
        linux-kernel@...r.kernel.org
Subject: Re:Re: [PATCH V2] workqueue: watchdog: update wq_watchdog_touched for unbound lockup checking


>On Fri 2021-03-19 16:00:36, Wang Qing wrote:
>> When touch_softlockup_watchdog() is called, only wq_watchdog_touched_cpu 
>> updated, while the unbound worker_pool running on its core uses 
>> wq_watchdog_touched to determine whether locked up. This may be mischecked.
>
>By other words, unbound workqueues are not aware of the more common
>touch_softlockup_watchdog() because it updates only
>wq_watchdog_touched_cpu for the affected CPU. As a result,
>the workqueue watchdog might report lockup in unbound workqueue
>even though it is blocked by a known slow code.

Yes, this is the problem I'm talking about.
>
>> My suggestion is to update both when touch_softlockup_watchdog() is called, 
>> use wq_watchdog_touched_cpu to check bound, and use wq_watchdog_touched 
>> to check unbound worker_pool.
>> 
>> Signed-off-by: Wang Qing <wangqing@...o.com>
>> ---
>>  kernel/watchdog.c  |  5 +++--
>>  kernel/workqueue.c | 17 ++++++-----------
>>  2 files changed, 9 insertions(+), 13 deletions(-)
>> 
>> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
>> index 7110906..107bc38
>> --- a/kernel/watchdog.c
>> +++ b/kernel/watchdog.c
>> @@ -278,9 +278,10 @@ void touch_all_softlockup_watchdogs(void)
>>  	 * update as well, the only side effect might be a cycle delay for
>>  	 * the softlockup check.
>>  	 */
>> -	for_each_cpu(cpu, &watchdog_allowed_mask)
>> +	for_each_cpu(cpu, &watchdog_allowed_mask) {
>>  		per_cpu(watchdog_touch_ts, cpu) = SOFTLOCKUP_RESET;
>> -	wq_watchdog_touch(-1);
>> +		wq_watchdog_touch(cpu);
>
>Note that wq_watchdog_touch(cpu) newly always updates
>wq_watchdog_touched. This cycle will set the same jiffies
>value cpu-times to the same variable.
>
Although there is a bit of redundancy here, but the most concise way of 
implementation, and it is certain that it will not affect performance.

>> +	}
>>  }
>>  
>>  void touch_softlockup_watchdog_sync(void)
>> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
>> index 0d150da..be08295
>> --- a/kernel/workqueue.c
>> +++ b/kernel/workqueue.c
>> @@ -5787,22 +5787,17 @@ static void wq_watchdog_timer_fn(struct timer_list *unused)
>>  			continue;
>>  
>>  		/* get the latest of pool and touched timestamps */
>> +		if (pool->cpu >= 0)
>> +			touched = READ_ONCE(per_cpu(wq_watchdog_touched_cpu, pool->cpu));
>> +		else
>> +			touched = READ_ONCE(wq_watchdog_touched);
>>  		pool_ts = READ_ONCE(pool->watchdog_ts);
>> -		touched = READ_ONCE(wq_watchdog_touched);
>>  
>>  		if (time_after(pool_ts, touched))
>>  			ts = pool_ts;
>>  		else
>>  			ts = touched;
>>  
>> -		if (pool->cpu >= 0) {
>> -			unsigned long cpu_touched =
>> -				READ_ONCE(per_cpu(wq_watchdog_touched_cpu,
>> -						  pool->cpu));
>> -			if (time_after(cpu_touched, ts))
>> -				ts = cpu_touched;
>> -		}
>> -
>>  		/* did we stall? */
>>  		if (time_after(jiffies, ts + thresh)) {
>>  			lockup_detected = true;
>> @@ -5826,8 +5821,8 @@ notrace void wq_watchdog_touch(int cpu)
>>  {
>>  	if (cpu >= 0)
>>  		per_cpu(wq_watchdog_touched_cpu, cpu) = jiffies;
>> -	else
>> -		wq_watchdog_touched = jiffies;
>> +
>> +	wq_watchdog_touched = jiffies;
>>  }
>>  
>>  static void wq_watchdog_set_thresh(unsigned long thresh)
>
>This last hunk is enough to fix the problem. wq_watchdog_touched will
>get updated also from cpu-specific touch_softlockup_watchdog().

Not enough in fact, because wq_watchdog_touched was updated in wq_watchdog_touch(), 
so wq_watchdog_touched can no longer be used to judge the bound pool, we must update 
every wq_watchdog_touched_cpu in touch_all_softlockup_watchdogs() for bound judgment.

Thanks,
Qing Wang
>
>The original patch simplified the logic of wq_watchdog_timer_fn().
>But it added un-necessary assignments into
>touch_all_softlockup_watchdogs(void).
>
>I do not have strong opinion what solution is better. I slightly
>prefer to keep only this last hunk.
>
>Best Regards,
>Petr


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ