[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <551D6373.2030000@ezchip.com>
Date: Thu, 2 Apr 2015 11:42:43 -0400
From: Chris Metcalf <cmetcalf@...hip.com>
To: Frederic Weisbecker <fweisbec@...il.com>,
Don Zickus <dzickus@...hat.com>
CC: Ingo Molnar <mingo@...nel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Andrew Jones <drjones@...hat.com>,
chai wen <chaiw.fnst@...fujitsu.com>,
Ulrich Obergfell <uobergfe@...hat.com>,
Fabian Frederick <fabf@...net.be>,
Aaron Tomlin <atomlin@...hat.com>,
Ben Zhang <benzh@...omium.org>,
Christoph Lameter <cl@...ux.com>,
Gilad Ben-Yossef <gilad@...yossef.com>,
Steven Rostedt <rostedt@...dmis.org>,
open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] watchdog: nohz: don't run watchdog on nohz_full cores
On 04/02/2015 11:38 AM, Frederic Weisbecker wrote:
> On Thu, Apr 02, 2015 at 10:15:27AM -0400, Don Zickus wrote:
>> On Thu, Apr 02, 2015 at 09:49:45AM -0400, Chris Metcalf wrote:
>>>> Can I ask how the NO_HZ_FULL technology works from userspace? Is there a
>>>> system command that has to be sent? How does the kernel know to turn off
>>>> ticks and trust userspace to do the right thing?
>>> The NO_HZ_FULL option, when configured into the kernel, lets
>>> you boot with "nohz_full=1-15" (or whatever cpumask you like),
>>> typically in conjunction with "isolcpus=1-15". At this point no tasks
>>> will run on those cores until explicitly placed there by affinity, and
>>> once there and running in userspace, the kernel will automatically
>>> get out of their way and not interrupt at all. This lets those tasks
>>> run with 100.000% of the cpu, which is a requirement for many
>>> user-space device drivers running high throughput devices.
>>> (This is typically the use case for the tile architecture customers.)
>>>
>>> So, other than a boot flag, there are no system commands or
>>> other APIs to deal with.
>> Ah, I am starting to understand your approach in the original patch better.
>>
>>> Part of the requirement, though, is that there can be only one task
>>> bound and runnable on that cpu, otherwise the kernel has to be
>>> involved to do the context-switching off of the scheduler tick.
>>> This is why having the standard watchdog kernel thread doesn't
>>> work in this context.
>> So, there is no preemption happening, which means the softlockup is rather
>> pointless.
> Still useful actually because nohz full only takes effect when a single task runs
> on the CPU. But there can still be more than 1 task running, just nohz full will
> be disabled. It all happens dynamically.
>
>> Can interrupts be disabled or handled on that cpu? I am trying
>> to see if the hardlockup detector becomes rather silly on those cpus too.
> No interrupts aren't disabled on these CPUs. Now the goal is to avoid them:
> migrate irqs, nohz full, etc...
>
> But there can be irqs. And actually there is at least 1 tick every second in
> order to keep the scheduler stats moving forward. We plan to get rid of it but
> anyway the point is that IRQ can happen on nohz full CPUs.
>
>>> I continue to suspect that the right model here is to disable the
>>> watchdog specifically on the cores that the user has tagged with
>>> the nohz_full boot argument. I agree that there might be a case
>>> to be made for leaving the watchdog conditionally (as suggested
>>> by Ingo) but it should be possible to have the watchdogs on
>>> the nohz_full cores be turned off completely if desired.
>> I think I might be slowly coming around to your thoughts. I might request a
>> different patch though based on the answers above. Maybe even create a
>> subset of the online cpus for the watchdog to work off of. The watchdog
>> would copy the online cpu mask, mask off the nohz cpus and just function
>> that way. It would print loud messages for each nohz cpu it was masking
>> off.
> All agreed with that! We should at least keep the watchdog running on
> non-nohz-full CPUs. And also allow to re-enable it everywhere when needed,
> in case we have a lockup to chase on nohz full CPUs.
>
>> Then perhaps as a debug aid, expose a /proc/sys/kernel/watchdog_cpumask for
>> folks to modify in case they want to enable the watchdog on the nohz cpus.
> That sounds like a good idea.
OK, I will respin v2 of the patch as follows:
- Provide a watchdog_cpumask as suggested by Don.
- On a non-NO_HZ_FULL build, it defaults to cpu_possible as normal
- On a NO_HZ_FULL build, it defaults to the housekeeping cpus
- If the mask is modified, we disable and then re-enable the watchdog,
so that the watchdog init code can exit() the appropriate threads as
they start up
This should address the various concerns that have been raised.
--
Chris Metcalf, EZChip Semiconductor
http://www.ezchip.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists