linux-kernel - Re: [PATCH] watchdog: nohz: don't run watchdog on nohz

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <551D6373.2030000@ezchip.com>
Date:	Thu, 2 Apr 2015 11:42:43 -0400
From:	Chris Metcalf <cmetcalf@...hip.com>
To:	Frederic Weisbecker <fweisbec@...il.com>,
	Don Zickus <dzickus@...hat.com>
CC:	Ingo Molnar <mingo@...nel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Andrew Jones <drjones@...hat.com>,
	chai wen <chaiw.fnst@...fujitsu.com>,
	Ulrich Obergfell <uobergfe@...hat.com>,
	Fabian Frederick <fabf@...net.be>,
	Aaron Tomlin <atomlin@...hat.com>,
	Ben Zhang <benzh@...omium.org>,
	Christoph Lameter <cl@...ux.com>,
	Gilad Ben-Yossef <gilad@...yossef.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] watchdog: nohz: don't run watchdog on nohz_full cores

On 04/02/2015 11:38 AM, Frederic Weisbecker wrote:
> On Thu, Apr 02, 2015 at 10:15:27AM -0400, Don Zickus wrote:
>> On Thu, Apr 02, 2015 at 09:49:45AM -0400, Chris Metcalf wrote:
>>>> Can I ask how the NO_HZ_FULL technology works from userspace?  Is there a
>>>> system command that has to be sent?  How does the kernel know to turn off
>>>> ticks and trust userspace to do the right thing?
>>> The NO_HZ_FULL option, when configured into the kernel, lets
>>> you boot with "nohz_full=1-15" (or whatever cpumask you like),
>>> typically in conjunction with "isolcpus=1-15".  At this point no tasks
>>> will run on those cores until explicitly placed there by affinity, and
>>> once there and running in userspace, the kernel will automatically
>>> get out of their way and not interrupt at all.  This lets those tasks
>>> run with 100.000% of the cpu, which is a requirement for many
>>> user-space device drivers running high throughput devices.
>>> (This is typically the use case for the tile architecture customers.)
>>>
>>> So, other than a boot flag, there are no system commands or
>>> other APIs to deal with.
>> Ah, I am starting to understand your approach in the original patch better.
>>
>>> Part of the requirement, though, is that there can be only one task
>>> bound and runnable on that cpu, otherwise the kernel has to be
>>> involved to do the context-switching off of the scheduler tick.
>>> This is why having the standard watchdog kernel thread doesn't
>>> work in this context.
>> So, there is no preemption happening, which means the softlockup is rather
>> pointless.
> Still useful actually because nohz full only takes effect when a single task runs
> on the CPU. But there can still be more than 1 task running, just nohz full will
> be disabled. It all happens dynamically.
>
>> Can interrupts be disabled or handled on that cpu?  I am trying
>> to see if the hardlockup detector becomes rather silly on those cpus too.
> No interrupts aren't disabled on these CPUs. Now the goal is to avoid them:
> migrate irqs, nohz full, etc...
>
> But there can be irqs. And actually there is at least 1 tick every second in
> order to keep the scheduler stats moving forward. We plan to get rid of it but
> anyway the point is that IRQ can happen on nohz full CPUs.
>
>>> I continue to suspect that the right model here is to disable the
>>> watchdog specifically on the cores that the user has tagged with
>>> the nohz_full boot argument.  I agree that there might be a case
>>> to be made for leaving the watchdog conditionally (as suggested
>>> by Ingo) but it should be possible to have the watchdogs on
>>> the nohz_full cores be turned off completely if desired.
>> I think I might be slowly coming around to your thoughts.  I might request a
>> different patch though based on the answers above.  Maybe even create a
>> subset of the online cpus for the watchdog to work off of.  The watchdog
>> would copy the online cpu mask, mask off the nohz cpus and just function
>> that way.  It would print loud messages for each nohz cpu it was masking
>> off.
> All agreed with that! We should at least keep the watchdog running on
> non-nohz-full CPUs. And also allow to re-enable it everywhere when needed,
> in case we have a lockup to chase on nohz full CPUs.
>
>> Then perhaps as a debug aid, expose a /proc/sys/kernel/watchdog_cpumask for
>> folks to modify in case they want to enable the watchdog on the nohz cpus.
> That sounds like a good idea.

OK, I will respin v2 of the patch as follows:

- Provide a watchdog_cpumask as suggested by Don.
- On a non-NO_HZ_FULL build, it defaults to cpu_possible as normal
- On a NO_HZ_FULL build, it defaults to the housekeeping cpus
- If the mask is modified, we disable and then re-enable the watchdog,
   so that the watchdog init code can exit() the appropriate threads as
   they start up

This should address the various concerns that have been raised.

-- 
Chris Metcalf, EZChip Semiconductor
http://www.ezchip.com

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/