linux-kernel - Re: [QUERY]: Is using CPU hotplug right for isolating CPUs?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20140128132306.GB9172@localhost.localdomain>
Date:	Tue, 28 Jan 2014 14:23:08 +0100
From:	Frederic Weisbecker <fweisbec@...il.com>
To:	Viresh Kumar <viresh.kumar@...aro.org>
Cc:	Thomas Gleixner <tglx@...utronix.de>,
	Peter Zijlstra <peterz@...radead.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Lists linaro-kernel <linaro-kernel@...ts.linaro.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Linaro Networking <linaro-networking@...aro.org>,
	Kevin Hilman <khilman@...aro.org>
Subject: Re: [QUERY]: Is using CPU hotplug right for isolating CPUs?

On Fri, Jan 24, 2014 at 10:51:14AM +0530, Viresh Kumar wrote:
> On 23 January 2014 20:28, Frederic Weisbecker <fweisbec@...il.com> wrote:
> > On Tue, Jan 21, 2014 at 04:03:53PM +0530, Viresh Kumar wrote:
> 
> >> So, the main problem in my case was caused by this:
> >>
> >>            <...>-2147  [001] d..2   302.573881: hrtimer_start:
> >> hrtimer=c172aa50 function=tick_sched_timer expires=602075000000
> >> softexpires=602075000000
> >>
> >> I have mentioned this earlier when I sent you attachments. I think
> >> this is somehow
> >> tied with the NO_HZ_FULL stuff? As the timer is queued for 300 seconds after
> >> current time.
> >>
> >> How to get this out?
> >
> > So it's scheduled away 300 seconds later. It might be a pending timer_list. Enabling the
> > timer tracepoints may give you some clues.
> 
> Trace was done with that enabled. /proc/timer_list confirms that a hrtimer
> is queued for 300 seconds later for tick_sched_timer. And so I assumed
> this is part of the current NO_HZ_FULL implementation.
> 
> Just to confirm, when we decide that a CPU is running a single task and so
> can enter tickless mode, do we queue this tick_sched_timer for 300 seconds
> ahead of time? If not, then who is doing this :)

No, when a single task is running on a full dynticks CPU, the tick is supposed to run
every seconds. I'm actually suprised it doesn't happen in your traces, did you tweak
something specific?

The 300 seconds timer is probably due to a timer_list, just enable the 
timer_start and timer_expire_entry events to get the name of the culprit.

> 
> >> Which CPUs are housekeeping CPUs? How do we declare them?
> >
> > It's not yet implemented, but it's an idea (partly from Thomas) of something we can do to
> > define some general policy on various periodic/async work affinity to enforce isolation.
> >
> > The basic idea is to define the CPU handling the timekeeping duty to be the housekeeping
> > CPU. Given that CPU must keep a periodic tick, lets move all the unbound timers and
> > workqueues there. And also try to move some CPU affine work as well. For example
> > we could handle the scheduler tick of the full dynticks CPUs into that housekeeping
> > CPU, at a low freqency. This way we could remove that 1 second scheduler tick max deferment
> > per CPU. It may be an overkill though to run all the scheduler ticks on a single CPU so there
> > may be other ways to cope with that.
> >
> > And I would like to keep that housekeeping notion flexible enough to be extendable on more
> > than one CPU, as I heard that some people plan to reserve one CPU per node on big
> > NUMA machines for such a purpose. So that could be a cpumask, augmented with an infrastructure.
> >
> > Of course, if some people help contributing in this area, some things may eventually move foward
> > on the support of CPU isolation. I can't do that all alone, at least not quickly, given all the
> > things already pending in my queue (fix buggy nohz iowait accounting, support RCU full sysidle detection,
> > apply AMD range breakpoints patches, further cleanup posix cpu timers, etc...).
> 
> I see. As I am currently working on the isolation stuff which is very
> much required
> for my usecase, I will try to do that as the second step of my work.
> The first one
> stays something like a cpuset.quiesce option that PeterZ suggested.

Cool!

> 
> Any pointers of earlier discussion on this topic would be helpful to
> start working on
> this..

I think that being able to control the UNBOUND workqueue affinity may be a nice
first step.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/