[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1443445947.3529.48.camel@gmail.com>
Date: Mon, 28 Sep 2015 15:12:27 +0200
From: Mike Galbraith <umgwanakikbuti@...il.com>
To: Kirill Tkhai <ktkhai@...n.com>
Cc: linux-kernel@...r.kernel.org,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>
Subject: Re: [PATCH] sched/fair: Skip wake_affine() for core siblings
On Mon, 2015-09-28 at 13:28 +0300, Kirill Tkhai wrote:
> Looks like, NAK may be better, because it saves L1 cache, while the patch always invalidates it.
Yeah, bounce hurts more when there's no concurrency win waiting to be
collected. This mixed load wasn't a great choice, but it turned out to
be pretty interesting. Something waking a gaggle of waiters on a busy
big socket should do very bad things.
> Could you say, do you execute pgbench using just -cX -jY -T30 or something special? I've tried it,
> but the dispersion of the results much differs from time to time.
pgbench -T $testtime -j 1 -S -c $clients
> > Ok, that's what I want to see, full repeat.
> > master = twiddle
> > master+ = twiddle+patch
> >
> > concurrent tbench 4 + pgbench, 2 minutes per client count (i4790+smt)
> > master master+
> > pgbench 1 2 3 avg 1 2 3 avg comp
> > clients 1 tps = 18599 18627 18532 18586 17480 17682 17606 17589 .946
> > clients 2 tps = 32344 32313 32408 32355 25167 26140 23730 25012 .773
> > clients 4 tps = 52593 51390 51095 51692 22983 23046 22427 22818 .441
> > clients 8 tps = 70354 69583 70107 70014 66924 66672 69310 67635 .966
> >
> > Hrm... turn the tables, measure tbench while pgbench 4 client load runs endlessly.
> >
> > master master+
> > tbench 1 2 3 avg 1 2 3 avg comp
> > pairs 1 MB/s = 430 426 436 430 481 481 494 485 1.127
> > pairs 2 MB/s = 1083 1085 1072 1080 1086 1090 1083 1086 1.005
> > pairs 4 MB/s = 1725 1697 1729 1717 2023 2002 2006 2010 1.170
> > pairs 8 MB/s = 2740 2631 2700 2690 3016 2977 3071 3021 1.123
> >
> > tbench without competition
> > master master+ comp
> > pairs 1 MB/s = 694 692 .997
> > pairs 2 MB/s = 1268 1259 .992
> > pairs 4 MB/s = 2210 2165 .979
> > pairs 8 MB/s = 3586 3526 .983 (yawn, all within routine variance)
>
> Hm, it seems tbench with competition is better only because of a busy system makes tbench
> processes be woken on the same cpu.
Yeah. When box is really full, select_idle_sibling() (obviously) turns
into a waste of cycles, but even as you approach that, especially when
filling the box with identical copies of nearly fully synchronous high
frequency localhost packet blasters, stacking is a win.
What bent my head up a bit was the combined effect of making wake_wide()
really keep pgbench from collapsing then adding the affine wakeup grant
for tbench. It's not at all clear to me why 2,4 would be so demolished.
-Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists