linux-kernel - Re: [RFC PATCH 7/9] housekeeping: Use own boot option, independant from nohz

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170813111340.0ade6d58@redhat.com>
Date:   Sun, 13 Aug 2017 11:13:40 -0400
From:   Luiz Capitulino <lcapitulino@...hat.com>
To:     Frederic Weisbecker <fweisbec@...il.com>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Chris Metcalf <cmetcalf@...lanox.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Christoph Lameter <cl@...ux.com>,
        "Paul E . McKenney" <paulmck@...ux.vnet.ibm.com>,
        Ingo Molnar <mingo@...nel.org>, Mike Galbraith <efault@....de>,
        Rik van Riel <riel@...hat.com>,
        Wanpeng Li <kernellwp@...il.com>
Subject: Re: [RFC PATCH 7/9] housekeeping: Use own boot option, independant
 from nohz

On Sat, 12 Aug 2017 16:10:06 +0200
Frederic Weisbecker <fweisbec@...il.com> wrote:

> On Fri, Aug 11, 2017 at 03:09:57PM -0400, Luiz Capitulino wrote:
> > On Fri, 21 Jul 2017 15:21:28 +0200
> > Frederic Weisbecker <fweisbec@...il.com> wrote:  
> > > -void __init housekeeping_init(void)
> > > +/* Parse the boot-time housekeeping CPU list from the kernel parameters. */
> > > +static int __init housekeeping_setup(char *str)
> > >  {
> > > -	if (!tick_nohz_full_enabled())
> > > -		return;
> > > -
> > > -	if (!alloc_cpumask_var(&housekeeping_mask, GFP_KERNEL)) {
> > > -		WARN(1, "NO_HZ: Can't allocate not-full dynticks cpumask\n");
> > > -		cpumask_clear(tick_nohz_full_mask);
> > > -		tick_nohz_full_running = false;
> > > -		return;
> > > +	alloc_bootmem_cpumask_var(&housekeeping_mask);
> > > +	if (cpulist_parse(str, housekeeping_mask) < 0) {
> > > +		pr_warn("Housekeeping: Incorrect cpumask\n");
> > > +		free_bootmem_cpumask_var(housekeeping_mask);
> > > +		return 1;
> > >  	}
> > >  
> > > -	cpumask_andnot(housekeeping_mask,
> > > -		       cpu_possible_mask, tick_nohz_full_mask);
> > > -
> > >  	static_branch_enable(&housekeeping_overriden);
> > >  
> > >  	/* We need at least one CPU to handle housekeeping work */
> > >  	WARN_ON_ONCE(cpumask_empty(housekeeping_mask));
> > > +
> > > +	return 1;
> > >  }
> > > +__setup("housekeeping=", housekeeping_setup);  
> > 
> > Am I right that from now on nohz_full= users will also have
> > to specify housekeeping= in order to get nohz_full working?
> > If that's correct, then won't this patch break nohz_full for
> > existing setups?  
> 
> nohz_full= will still work but will only imply tick stop. A few isolation
> details that were enabled by nohz_full= won't be handled anymore such as:
> unbound timers affinity, watchdog disablement, rcu threads affinity, sched idle
> load balancing... Those are now handled by housekeeping=
> 
> So yes in a sense, this can break some setup that assume nohz_full= does more
> than stopping the tick.

Yes, the problem is that this is how it has always worked. Also,
the breakage will be very subtle and hard to debug.

> Perhaps I should remove the nohz_full= parameter altogether and let nohz_full controlled
> by housekeeping= only. How much can kernel parameters be considered as kernel ABIs?

That's a very good question, I don't have an answer for that.

> Also I'm wondering if "housekeeping=" is a clear name for users. "isolation=" or
> "cpu_isolation=" would be better and more obvious. Housekeeping based naming would only be
> internal implementation detail. And deactivating the tick through "cpu_isolation=" would
> be clearer than if we did through "housekeeping=".

That's exactly my thinking while I was reviewing the series!

> Of course the problem is that we already have "isolcpus=". But re-implementing isolcpus
> on top of housekeeping might be a good idea. I believe that the current implementation on
> top of NULL domains isn't much beloved. A less controversial implementation might even
> allow us to control it though cpusets.

You're completely right. Some people don't use isolcpus= because it
disables load balancing and that may be a problem for setups where
tasks are pinned to a set of CPUs where the number of tasks is greater
than the number of CPUs. However, for the cases where you have a
single task pinned to a CPU, having load balancing taking place adds
an extra latency (I won't remember how much, but I guess it was more
than 10us).

If there's a way to "disable" load balancing from user-space, say
with cpusets, then I think we should keep the isolated CPUs attached
to a domain as you suggest.

Another detail about isolcpus= is that it doesn't isolate the CPU
from kernel threads. That is, unpinned kernel threads are allowed
to run on CPUs not isolated with isolcpus=. We might consider changing
that for a new isolation option.

I know that there are many arguments against isolcpus= and some people
advice using cpusets. The problem with that advice is that isolcpus=
goes a bit beyond isolating a CPU from user-space tasks. One additional
thing is does for example, is pinning the kernel_init() thread to
housekeeping CPUs. This is key, because that thread will create timers
at early boot that will pin themselves to the CPU they run.

Finally, I'm wondering how all this will fit together with TASK_ISOLATION.
One of the questions I ask myself is: can/should the things TASK_ISOLATION
does be done by a kernel command-line parameter instead? Or should we
try to come up with a list of global things to control (eg. the tick,
kernel thread affinity, etc) and per-task controls?