[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160614194217.GK30921@twins.programming.kicks-ass.net>
Date: Tue, 14 Jun 2016 21:42:17 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: LKML <linux-kernel@...r.kernel.org>,
Ingo Molnar <mingo@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Clark Williams <williams@...hat.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Nick Piggin <nickpiggin@...oo.com.au>
Subject: Re: [PATCH] sched: Do not release current rq lock on non contended
double_lock_balance()
On Tue, Jun 14, 2016 at 02:02:28PM -0400, Steven Rostedt wrote:
> On Tue, 14 Jun 2016 13:58:20 +0200
> Peter Zijlstra <peterz@...radead.org> wrote:
> > And it does indeed make the hold time harder to analyze.
> >
> > For instance; pull_rt_task() does:
> >
> > for_each_cpu() {
> > double_lock_balance(this, that);
> > ...
> > double_unlock_balance(this, that);
> > }
> >
> > Which, with the trylock, ends up with a max possible hold time of
> > O(nr_cpus).
>
> Sure, but I think we should try to limit that loop too, because that
> loop itself is what is triggering the large latency for me, because
> it constantly releases a spinlock and has to wait. This loop is done
> with preemption disabled.
Much worse, its done with IRQs disabled. But that affects only the local
CPU. Holding the lock that long affects all other CPUs too.
> > Unlikely, sure, but RT is a game of upper bounds etc.
>
> Sure, but should we force worst case all the time?
How is that relevant? Either you have a bounded operation or you don't.
> We do a lot of optimization to allow for good throughput as well.
Only within keeping the upper bounds. The moment you let go of that,
you've destroyed RT.
> > So should we maybe do something like:
> >
> > if (unlikely(raw_spin_is_contended(&this_rq->lock) ||
> > !raw_spin_trylock(&busiest->lock))) {
>
> Why do we care if this_rq is contended?
To bound hold time.
> That's exactly what causes
> large latency to happen. Because when we let go of this_rq, this fast
> path becomes much slower because now it must wait for whatever is
> waiting on it to finish. The more CPUs you have, the bigger this issue
> becomes.
Yes, icky issue.
And while the numbers look pretty I'm not sure you've not introduced
another, less likely, issue.
Powered by blists - more mailing lists