linux-kernel - Re: [PATCH RFC] v5 expedited "big hammer" RCU grace periods

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090529120637.GA32184@in.ibm.com>
Date:	Fri, 29 May 2009 17:36:37 +0530
From:	Gautham R Shenoy <ego@...ibm.com>
To:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Cc:	Ingo Molnar <mingo@...e.hu>, linux-kernel@...r.kernel.org,
	netdev@...r.kernel.org, netfilter-devel@...r.kernel.org,
	akpm@...ux-foundation.org, torvalds@...ux-foundation.org,
	davem@...emloft.net, dada1@...mosbay.com, zbr@...emap.net,
	jeff.chua.linux@...il.com, paulus@...ba.org, laijs@...fujitsu.com,
	jengelh@...ozas.de, r000n@...0n.net, benh@...nel.crashing.org,
	mathieu.desnoyers@...ymtl.ca, Nathan Lynch <ntl@...ox.com>,
	Srivatsa Vaddagiri <vatsa@...ibm.com>
Subject: Re: [PATCH RFC] v5 expedited "big hammer" RCU grace periods

On Thu, May 28, 2009 at 06:22:51PM -0700, Paul E. McKenney wrote:
> 
> Hmmm...  Making the transition work nicely would require some thought.
> It might be good to retain the two-phase nature, even when reversing
> the order of offline notifications.  This would address one disadvantage
> of the past-life version, which was unnecessary migration of processes
> off of the CPU in question, only to find that a later notifier aborted
> the offlining.

The notifiers handling CPU_DEAD cannot abort it from here since the
operation has already completed, whether they like it or not!

If there exist notifiers which try to abort it from here, it's a BUG, as
the code says:

        /* CPU is completely dead: tell everyone.  Too late to complain.
	 * */
         if (raw_notifier_call_chain(&cpu_chain, CPU_DEAD | mod,
	                                     hcpu) == NOTIFY_BAD)
	                     BUG();

Also, one can thus consider the CPU_DEAD and the CPU_POST_DEAD parts to be
the extensions of the second phase. Just that we do some
additional cleanup once the CPU has actually gone down. migration of
processes (while breaking their affinity if required) is one of them.

But there are other things as well, such as rebuilding the sched-domain
which have to be done after the cpu has gone down. Currently this
operation contributes to majority of time taken to bring a cpu-offline.

>
> So only the first phase is permitted to abort the offlining of the CPU,
> and this first phase must also set whatever state is necessary to prevent
> some later operation from making it impossible to offline the CPU.
> The second phase would unconditionally take the CPU out of service.
> In theory, this approach would allow incremental conversion of the
> notifiers, waiting to remove the stop_machine stuff until all notifiers
> had been converted.
> If this actually works out, the sequence of changes would be as follows:
> 
> 1.	Reverse the order of the offline notifications, fixing any
> 	bugs induced/exposed by this change.
> 
> 2.	Incrementally convert notifiers to the new mechanism.  This
> 	will require more thought.
> 
> 3.	Get rid of the stop_machine and the CPU_DEAD once all are
> 	converted.

I agree with this sequence. It seems quite logical.

However, I am not yet sure if we can completely get rid of stop_machine
and CPU_DEAD in practice, unless we're okay with having an
time-consuming rollback operation. Currently the rollback only consists of
rolling back the actions done during CPU_UP_PREPARE/CPU_DOWN_PREPARE.

And from the notifiers profile (see attached file),
UP_PREPARE/DOWN_PREPARE seem to consume a lot lesser time
when compared to the post-hotplug notifications.

> 
> Or we might find that simply reversing the order (#1 above) suffices.
> 
> > > This meant that a given CPU was naturally guaranteed to be 
> > > correctly taking interrupts for the entire time that it was 
> > > capable of running user-level processes. Later in the offlining 
> > > process, it would still take interrupts, but would be unable to 
> > > run user processes.  Still later, it would no longer be taking 
> > > interrupts, and would stop participating in RCU and in the global 
> > > TLB-flush algorithm.  There was no need to stop the whole machine 
> > > to make a given CPU go offline, in fact, most of the work was done 
> > > by the CPU in question.
> > > 
> > > In the case of RCU, this meant that there was no need for 
> > > double-checking for offlined CPUs, because CPUs could reliably 
> > > indicate a quiescent state on their way out.
> > > 
> > > On the other hand, there was no equivalent of dynticks in the old 
> > > days. And it is dynticks that is responsible for most of the 
> > > complexity present in force_quiescent_state(), not CPU hotplug.
> > > 
> > > So I cannot hold up RCU as something that would be greatly 
> > > simplified by changing the CPU hotplug design, much as I might 
> > > like to.  ;-)
> > 
> > We could probably remove a fair bit of dynticks complexity by 
> > removing non-dynticks and removing non-hrtimer. People could still 
> > force a 'periodic' interrupting mode (if they want, or if their hw 
> > forces that), but that would be a plain periodic hrtimer firing off 
> > all the time.
> 
> Hmmm...  That would not simplify RCU much, but on the other hand (1) the
> rcutree.c dynticks approach is already quite a bit simpler than the
> rcupreempt.c approach and (2) doing this could potentially simplify
> other things.
> 
> 							Thanx, Paul
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

-- 
Thanks and Regards
gautham

View attachment "cpu-hotplug-summary" of type "text/plain" (16556 bytes)