[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20090530045633.GB7117@linux.vnet.ibm.com>
Date: Fri, 29 May 2009 21:56:33 -0700
From: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To: Gautham R Shenoy <ego@...ibm.com>
Cc: Ingo Molnar <mingo@...e.hu>, linux-kernel@...r.kernel.org,
netdev@...r.kernel.org, netfilter-devel@...r.kernel.org,
akpm@...ux-foundation.org, torvalds@...ux-foundation.org,
davem@...emloft.net, dada1@...mosbay.com, zbr@...emap.net,
jeff.chua.linux@...il.com, paulus@...ba.org, laijs@...fujitsu.com,
jengelh@...ozas.de, r000n@...0n.net, benh@...nel.crashing.org,
mathieu.desnoyers@...ymtl.ca, Nathan Lynch <ntl@...ox.com>,
Srivatsa Vaddagiri <vatsa@...ibm.com>
Subject: Re: [PATCH RFC] v5 expedited "big hammer" RCU grace periods
On Fri, May 29, 2009 at 05:36:37PM +0530, Gautham R Shenoy wrote:
> On Thu, May 28, 2009 at 06:22:51PM -0700, Paul E. McKenney wrote:
> >
> > Hmmm... Making the transition work nicely would require some thought.
> > It might be good to retain the two-phase nature, even when reversing
> > the order of offline notifications. This would address one disadvantage
> > of the past-life version, which was unnecessary migration of processes
> > off of the CPU in question, only to find that a later notifier aborted
> > the offlining.
>
> The notifiers handling CPU_DEAD cannot abort it from here since the
> operation has already completed, whether they like it or not!
Hello, Gautham,
We are talking past each other -- the past-life (not Linux) CPU-offlining
scheme had but one phase for offlining, which meant that if a very late
notifier-equivalent realized that the offlining could not proceed,
it would have mostly shut the CPU down, only to have to restart it.
For example, it might have needlessly migrated processes off of that
CPU. This did not happen often, but it was a bit of a disadvantage.
Thanx, Paul
> If there exist notifiers which try to abort it from here, it's a BUG, as
> the code says:
>
> /* CPU is completely dead: tell everyone. Too late to complain.
> * */
> if (raw_notifier_call_chain(&cpu_chain, CPU_DEAD | mod,
> hcpu) == NOTIFY_BAD)
> BUG();
>
> Also, one can thus consider the CPU_DEAD and the CPU_POST_DEAD parts to be
> the extensions of the second phase. Just that we do some
> additional cleanup once the CPU has actually gone down. migration of
> processes (while breaking their affinity if required) is one of them.
>
> But there are other things as well, such as rebuilding the sched-domain
> which have to be done after the cpu has gone down. Currently this
> operation contributes to majority of time taken to bring a cpu-offline.
>
> >
> > So only the first phase is permitted to abort the offlining of the CPU,
> > and this first phase must also set whatever state is necessary to prevent
> > some later operation from making it impossible to offline the CPU.
> > The second phase would unconditionally take the CPU out of service.
> > In theory, this approach would allow incremental conversion of the
> > notifiers, waiting to remove the stop_machine stuff until all notifiers
> > had been converted.
> > If this actually works out, the sequence of changes would be as follows:
> >
> > 1. Reverse the order of the offline notifications, fixing any
> > bugs induced/exposed by this change.
> >
> > 2. Incrementally convert notifiers to the new mechanism. This
> > will require more thought.
> >
> > 3. Get rid of the stop_machine and the CPU_DEAD once all are
> > converted.
>
> I agree with this sequence. It seems quite logical.
>
> However, I am not yet sure if we can completely get rid of stop_machine
> and CPU_DEAD in practice, unless we're okay with having an
> time-consuming rollback operation. Currently the rollback only consists of
> rolling back the actions done during CPU_UP_PREPARE/CPU_DOWN_PREPARE.
>
> And from the notifiers profile (see attached file),
> UP_PREPARE/DOWN_PREPARE seem to consume a lot lesser time
> when compared to the post-hotplug notifications.
>
> >
> > Or we might find that simply reversing the order (#1 above) suffices.
> >
> > > > This meant that a given CPU was naturally guaranteed to be
> > > > correctly taking interrupts for the entire time that it was
> > > > capable of running user-level processes. Later in the offlining
> > > > process, it would still take interrupts, but would be unable to
> > > > run user processes. Still later, it would no longer be taking
> > > > interrupts, and would stop participating in RCU and in the global
> > > > TLB-flush algorithm. There was no need to stop the whole machine
> > > > to make a given CPU go offline, in fact, most of the work was done
> > > > by the CPU in question.
> > > >
> > > > In the case of RCU, this meant that there was no need for
> > > > double-checking for offlined CPUs, because CPUs could reliably
> > > > indicate a quiescent state on their way out.
> > > >
> > > > On the other hand, there was no equivalent of dynticks in the old
> > > > days. And it is dynticks that is responsible for most of the
> > > > complexity present in force_quiescent_state(), not CPU hotplug.
> > > >
> > > > So I cannot hold up RCU as something that would be greatly
> > > > simplified by changing the CPU hotplug design, much as I might
> > > > like to. ;-)
> > >
> > > We could probably remove a fair bit of dynticks complexity by
> > > removing non-dynticks and removing non-hrtimer. People could still
> > > force a 'periodic' interrupting mode (if they want, or if their hw
> > > forces that), but that would be a plain periodic hrtimer firing off
> > > all the time.
> >
> > Hmmm... That would not simplify RCU much, but on the other hand (1) the
> > rcutree.c dynticks approach is already quite a bit simpler than the
> > rcupreempt.c approach and (2) doing this could potentially simplify
> > other things.
> >
> > Thanx, Paul
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@...r.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
> >
>
> --
> Thanks and Regards
> gautham
> =============================================================================
> statistics for CPU_DOWN_PREPARE
> =============================================================================
> 410 ns: buffer_cpu_notify : CPU_DOWN_PREPARE
> 441 ns: radix_tree_callback : CPU_DOWN_PREPARE
> 473 ns: relay_hotcpu_callback : CPU_DOWN_PREPARE
> 486 ns: blk_cpu_notify : CPU_DOWN_PREPARE
> 563 ns: cpu_callback : CPU_DOWN_PREPARE
> 579 ns: hotplug_hrtick : CPU_DOWN_PREPARE
> 594 ns: cpu_callback : CPU_DOWN_PREPARE
> 605 ns: cpu_numa_callback : CPU_DOWN_PREPARE
> 611 ns: hrtimer_cpu_notify : CPU_DOWN_PREPARE
> 625 ns: flow_cache_cpu : CPU_DOWN_PREPARE
> 625 ns: rcu_barrier_cpu_hotplug : CPU_DOWN_PREPARE
> 639 ns: hotplug_cfd : CPU_DOWN_PREPARE
> 641 ns: pageset_cpuup_callback : CPU_DOWN_PREPARE
> 656 ns: rb_cpu_notify : CPU_DOWN_PREPARE
> 670 ns: dev_cpu_callback : CPU_DOWN_PREPARE
> 670 ns: topology_cpu_callback : CPU_DOWN_PREPARE
> 672 ns: remote_softirq_cpu_notify : CPU_DOWN_PREPARE
> 715 ns: ratelimit_handler : CPU_DOWN_PREPARE
> 715 ns: rcu_cpu_notify : CPU_DOWN_PREPARE
> 717 ns: timer_cpu_notify : CPU_DOWN_PREPARE
> 730 ns: page_alloc_cpu_notify : CPU_DOWN_PREPARE
> 746 ns: cpu_callback : CPU_DOWN_PREPARE
> 821 ns: cpuset_track_online_cpus : CPU_DOWN_PREPARE
> 824 ns: slab_cpuup_callback : CPU_DOWN_PREPARE
> 849 ns: sysfs_cpu_notify : CPU_DOWN_PREPARE
> 884 ns: percpu_counter_hotcpu_callback: CPU_DOWN_PREPARE
> 961 ns: update_runtime : CPU_DOWN_PREPARE
> 1323 ns: migration_call : CPU_DOWN_PREPARE
> 1918 ns: vmstat_cpuup_callback : CPU_DOWN_PREPARE
> 2072 ns: workqueue_cpu_callback : CPU_DOWN_PREPARE
> =========================================================================
> Total time for CPU_DOWN_PREPARE = .023235000 ms
> =========================================================================
> =============================================================================
> statistics for CPU_DYING
> =============================================================================
> 365 ns: remote_softirq_cpu_notify : CPU_DYING
> 365 ns: topology_cpu_callback : CPU_DYING
> 381 ns: blk_cpu_notify : CPU_DYING
> 381 ns: cpu_callback : CPU_DYING
> 381 ns: relay_hotcpu_callback : CPU_DYING
> 381 ns: update_runtime : CPU_DYING
> 394 ns: dev_cpu_callback : CPU_DYING
> 395 ns: hotplug_cfd : CPU_DYING
> 395 ns: vmstat_cpuup_callback : CPU_DYING
> 397 ns: cpuset_track_online_cpus : CPU_DYING
> 397 ns: flow_cache_cpu : CPU_DYING
> 397 ns: pageset_cpuup_callback : CPU_DYING
> 397 ns: rb_cpu_notify : CPU_DYING
> 398 ns: hotplug_hrtick : CPU_DYING
> 410 ns: cpu_callback : CPU_DYING
> 410 ns: page_alloc_cpu_notify : CPU_DYING
> 411 ns: rcu_cpu_notify : CPU_DYING
> 412 ns: slab_cpuup_callback : CPU_DYING
> 412 ns: sysfs_cpu_notify : CPU_DYING
> 412 ns: timer_cpu_notify : CPU_DYING
> 426 ns: buffer_cpu_notify : CPU_DYING
> 426 ns: radix_tree_callback : CPU_DYING
> 441 ns: cpu_callback : CPU_DYING
> 442 ns: cpu_numa_callback : CPU_DYING
> 473 ns: ratelimit_handler : CPU_DYING
> 531 ns: percpu_counter_hotcpu_callback: CPU_DYING
> 562 ns: workqueue_cpu_callback : CPU_DYING
> 730 ns: rcu_barrier_cpu_hotplug : CPU_DYING
> 1536 ns: migration_call : CPU_DYING
> 1873 ns: hrtimer_cpu_notify : CPU_DYING
> =========================================================================
> Total time for CPU_DYING = .015331000 ms
> =========================================================================
> =============================================================================
> statistics for CPU_DOWN_CANCELED
> =============================================================================
> =========================================================================
> Total time for CPU_DOWN_CANCELED = 0 ms
> =========================================================================
> =============================================================================
> statistics for __stop_machine
> =============================================================================
> 357983 ns: __stop_machine :
> =========================================================================
> Total time for __stop_machine = .357983000 ms
> =========================================================================
> =============================================================================
> statistics for CPU_DEAD
> =============================================================================
> 350 ns: update_runtime : CPU_DEAD
> 379 ns: hotplug_hrtick : CPU_DEAD
> 381 ns: cpu_callback : CPU_DEAD
> 381 ns: rb_cpu_notify : CPU_DEAD
> 426 ns: hotplug_cfd : CPU_DEAD
> 426 ns: relay_hotcpu_callback : CPU_DEAD
> 441 ns: rcu_barrier_cpu_hotplug : CPU_DEAD
> 442 ns: remote_softirq_cpu_notify : CPU_DEAD
> 609 ns: ratelimit_handler : CPU_DEAD
> 625 ns: cpu_numa_callback : CPU_DEAD
> 684 ns: dev_cpu_callback : CPU_DEAD
> 686 ns: workqueue_cpu_callback : CPU_DEAD
> 838 ns: rcu_cpu_notify : CPU_DEAD
> 898 ns: pageset_cpuup_callback : CPU_DEAD
> 1202 ns: vmstat_cpuup_callback : CPU_DEAD
> 1295 ns: blk_cpu_notify : CPU_DEAD
> 1554 ns: buffer_cpu_notify : CPU_DEAD
> 2588 ns: hrtimer_cpu_notify : CPU_DEAD
> 3274 ns: radix_tree_callback : CPU_DEAD
> 5246 ns: timer_cpu_notify : CPU_DEAD
> 8587 ns: flow_cache_cpu : CPU_DEAD
> 8645 ns: topology_cpu_callback : CPU_DEAD
> 12454 ns: cpu_callback : CPU_DEAD
> 12650 ns: cpu_callback : CPU_DEAD
> 45727 ns: percpu_counter_hotcpu_callback: CPU_DEAD
> 55242 ns: page_alloc_cpu_notify : CPU_DEAD
> 56766 ns: sysfs_cpu_notify : CPU_DEAD
> 58241 ns: slab_cpuup_callback : CPU_DEAD
> 78250 ns: migration_call : CPU_DEAD
> 10784759 ns: cpuset_track_online_cpus : CPU_DEAD
> =========================================================================
> Total time for CPU_DEAD = 11.144046000 ms
> =========================================================================
> =============================================================================
> statistics for CPU_POST_DEAD
> =============================================================================
> 350 ns: cpu_callback : CPU_POST_DEAD
> 365 ns: blk_cpu_notify : CPU_POST_DEAD
> 365 ns: buffer_cpu_notify : CPU_POST_DEAD
> 365 ns: cpu_numa_callback : CPU_POST_DEAD
> 365 ns: dev_cpu_callback : CPU_POST_DEAD
> 365 ns: flow_cache_cpu : CPU_POST_DEAD
> 365 ns: hrtimer_cpu_notify : CPU_POST_DEAD
> 365 ns: page_alloc_cpu_notify : CPU_POST_DEAD
> 365 ns: rb_cpu_notify : CPU_POST_DEAD
> 365 ns: rcu_cpu_notify : CPU_POST_DEAD
> 365 ns: timer_cpu_notify : CPU_POST_DEAD
> 365 ns: update_runtime : CPU_POST_DEAD
> 366 ns: cpu_callback : CPU_POST_DEAD
> 366 ns: hotplug_cfd : CPU_POST_DEAD
> 366 ns: pageset_cpuup_callback : CPU_POST_DEAD
> 366 ns: radix_tree_callback : CPU_POST_DEAD
> 367 ns: hotplug_hrtick : CPU_POST_DEAD
> 367 ns: topology_cpu_callback : CPU_POST_DEAD
> 367 ns: vmstat_cpuup_callback : CPU_POST_DEAD
> 381 ns: cpu_callback : CPU_POST_DEAD
> 381 ns: cpuset_track_online_cpus : CPU_POST_DEAD
> 381 ns: relay_hotcpu_callback : CPU_POST_DEAD
> 381 ns: sysfs_cpu_notify : CPU_POST_DEAD
> 383 ns: rcu_barrier_cpu_hotplug : CPU_POST_DEAD
> 410 ns: remote_softirq_cpu_notify : CPU_POST_DEAD
> 412 ns: slab_cpuup_callback : CPU_POST_DEAD
> 442 ns: migration_call : CPU_POST_DEAD
> 457 ns: percpu_counter_hotcpu_callback: CPU_POST_DEAD
> 502 ns: ratelimit_handler : CPU_POST_DEAD
> 86200 ns: workqueue_cpu_callback : CPU_POST_DEAD
> =========================================================================
> Total time for CPU_POST_DEAD = .097260000 ms
> =========================================================================
> =============================================================================
> statistics for CPU_UP_PREPARE
> =============================================================================
> 336 ns: hotplug_hrtick : CPU_UP_PREPARE
> 350 ns: cpu_callback : CPU_UP_PREPARE
> 365 ns: blk_cpu_notify : CPU_UP_PREPARE
> 381 ns: vmstat_cpuup_callback : CPU_UP_PREPARE
> 410 ns: buffer_cpu_notify : CPU_UP_PREPARE
> 410 ns: radix_tree_callback : CPU_UP_PREPARE
> 426 ns: dev_cpu_callback : CPU_UP_PREPARE
> 426 ns: remote_softirq_cpu_notify : CPU_UP_PREPARE
> 428 ns: cpuset_track_online_cpus : CPU_UP_PREPARE
> 441 ns: sysfs_cpu_notify : CPU_UP_PREPARE
> 471 ns: hotplug_cfd : CPU_UP_PREPARE
> 472 ns: rb_cpu_notify : CPU_UP_PREPARE
> 473 ns: flow_cache_cpu : CPU_UP_PREPARE
> 486 ns: page_alloc_cpu_notify : CPU_UP_PREPARE
> 488 ns: hrtimer_cpu_notify : CPU_UP_PREPARE
> 488 ns: update_runtime : CPU_UP_PREPARE
> 502 ns: rcu_barrier_cpu_hotplug : CPU_UP_PREPARE
> 531 ns: percpu_counter_hotcpu_callback: CPU_UP_PREPARE
> 547 ns: ratelimit_handler : CPU_UP_PREPARE
> 594 ns: relay_hotcpu_callback : CPU_UP_PREPARE
> 1125 ns: rcu_cpu_notify : CPU_UP_PREPARE
> 1309 ns: pageset_cpuup_callback : CPU_UP_PREPARE
> 1947 ns: timer_cpu_notify : CPU_UP_PREPARE
> 5389 ns: cpu_numa_callback : CPU_UP_PREPARE
> 6379 ns: topology_cpu_callback : CPU_UP_PREPARE
> 6436 ns: slab_cpuup_callback : CPU_UP_PREPARE
> 19879 ns: cpu_callback : CPU_UP_PREPARE
> 20227 ns: cpu_callback : CPU_UP_PREPARE
> 33940 ns: migration_call : CPU_UP_PREPARE
> 143731 ns: workqueue_cpu_callback : CPU_UP_PREPARE
> =========================================================================
> Total time for CPU_UP_PREPARE = .249387000 ms
> =========================================================================
> =============================================================================
> statistics for CPU_UP_CANCELED
> =============================================================================
> =========================================================================
> Total time for CPU_UP_CANCELED = 0 ms
> =========================================================================
> =============================================================================
> statistics for __cpu_up
> =============================================================================
> 205868908 ns: __cpu_up :
> =========================================================================
> Total time for __cpu_up = 205.868908000 ms
> =========================================================================
> =============================================================================
> statistics for CPU_STARTING
> =============================================================================
> 350 ns: hotplug_cfd : CPU_STARTING
> 352 ns: cpu_callback : CPU_STARTING
> 352 ns: remote_softirq_cpu_notify : CPU_STARTING
> 363 ns: vmstat_cpuup_callback : CPU_STARTING
> 365 ns: cpu_callback : CPU_STARTING
> 365 ns: dev_cpu_callback : CPU_STARTING
> 365 ns: hotplug_hrtick : CPU_STARTING
> 365 ns: radix_tree_callback : CPU_STARTING
> 365 ns: rb_cpu_notify : CPU_STARTING
> 368 ns: update_runtime : CPU_STARTING
> 379 ns: cpu_callback : CPU_STARTING
> 379 ns: cpu_numa_callback : CPU_STARTING
> 380 ns: rcu_barrier_cpu_hotplug : CPU_STARTING
> 380 ns: relay_hotcpu_callback : CPU_STARTING
> 381 ns: hrtimer_cpu_notify : CPU_STARTING
> 381 ns: pageset_cpuup_callback : CPU_STARTING
> 381 ns: slab_cpuup_callback : CPU_STARTING
> 382 ns: flow_cache_cpu : CPU_STARTING
> 394 ns: blk_cpu_notify : CPU_STARTING
> 397 ns: buffer_cpu_notify : CPU_STARTING
> 397 ns: percpu_counter_hotcpu_callback: CPU_STARTING
> 397 ns: sysfs_cpu_notify : CPU_STARTING
> 397 ns: topology_cpu_callback : CPU_STARTING
> 410 ns: rcu_cpu_notify : CPU_STARTING
> 412 ns: page_alloc_cpu_notify : CPU_STARTING
> 426 ns: cpuset_track_online_cpus : CPU_STARTING
> 455 ns: ratelimit_handler : CPU_STARTING
> 471 ns: timer_cpu_notify : CPU_STARTING
> 516 ns: migration_call : CPU_STARTING
> 549 ns: workqueue_cpu_callback : CPU_STARTING
> =========================================================================
> Total time for CPU_STARTING = .011874000 ms
> =========================================================================
> =============================================================================
> statistics for CPU_ONLINE
> =============================================================================
> 365 ns: radix_tree_callback : CPU_ONLINE
> 379 ns: hotplug_hrtick : CPU_ONLINE
> 381 ns: hrtimer_cpu_notify : CPU_ONLINE
> 381 ns: remote_softirq_cpu_notify : CPU_ONLINE
> 410 ns: slab_cpuup_callback : CPU_ONLINE
> 410 ns: timer_cpu_notify : CPU_ONLINE
> 412 ns: blk_cpu_notify : CPU_ONLINE
> 426 ns: dev_cpu_callback : CPU_ONLINE
> 426 ns: flow_cache_cpu : CPU_ONLINE
> 426 ns: topology_cpu_callback : CPU_ONLINE
> 428 ns: rcu_barrier_cpu_hotplug : CPU_ONLINE
> 428 ns: rcu_cpu_notify : CPU_ONLINE
> 440 ns: buffer_cpu_notify : CPU_ONLINE
> 455 ns: pageset_cpuup_callback : CPU_ONLINE
> 457 ns: relay_hotcpu_callback : CPU_ONLINE
> 473 ns: rb_cpu_notify : CPU_ONLINE
> 518 ns: update_runtime : CPU_ONLINE
> 549 ns: cpu_numa_callback : CPU_ONLINE
> 562 ns: ratelimit_handler : CPU_ONLINE
> 595 ns: page_alloc_cpu_notify : CPU_ONLINE
> 596 ns: hotplug_cfd : CPU_ONLINE
> 777 ns: percpu_counter_hotcpu_callback: CPU_ONLINE
> 1037 ns: cpu_callback : CPU_ONLINE
> 1280 ns: cpu_callback : CPU_ONLINE
> 1680 ns: cpu_callback : CPU_ONLINE
> 2043 ns: vmstat_cpuup_callback : CPU_ONLINE
> 3422 ns: migration_call : CPU_ONLINE
> 12344 ns: workqueue_cpu_callback : CPU_ONLINE
> 52879 ns: sysfs_cpu_notify : CPU_ONLINE
> 12287706 ns: cpuset_track_online_cpus : CPU_ONLINE
> =========================================================================
> Total time for CPU_ONLINE = 12.372685000 ms
> =========================================================================
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists