[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120521193051.GB28819@gmail.com>
Date: Mon, 21 May 2012 21:30:51 +0200
From: Ingo Molnar <mingo@...nel.org>
To: Suresh Siddha <suresh.b.siddha@...el.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
Alexander Gordeev <agordeev@...hat.com>,
Arjan van de Ven <arjan@...radead.org>,
linux-kernel@...r.kernel.org, x86@...nel.org,
Cyrill Gorcunov <gorcunov@...nvz.org>,
Yinghai Lu <yinghai@...nel.org>
Subject: Re: [PATCH 2/3] x86: x2apic/cluster: Make use of lowest priority
delivery mode
* Suresh Siddha <suresh.b.siddha@...el.com> wrote:
> > But I do agree with Ingo that it would be really good to
> > actually see numbers (and no, I don't mean "look here, now
> > the irq's are nicely spread out", but power and/or
> > performance numbers showing that it actually helps
> > something).
>
> I agree. This is the reason why I held up posting these
> patches before. I can come up with micro-benchmarks that can
> show some difference but the key is to find good
> workload/benchmark that can show measurable difference. Any
> suggestions?
It's rather difficult to measure this reliably. The main
complication is the inherent noise of cache stats on SMP/NUMA
systems, which all modern multi-socket systems are ...
But, since you asked, if you can generate a *very* precise
incoming external IRQ rate, it's possible:
Generate say 10,000 irqs/sec of a workload directed at a single
CPU - something like multiple copies of ping -i 0.001 -q
executed on a nearby system might do.
Then run a user-space cycle soaker, nice -19 running NOPs on all
CPUs. It's important that it *only* a user-space infinite loop,
with no kernel instructions executed at all - see later.
Then play around with variants of:
perf stat -a --repeat 10 -e cycles:u -e instructions:u sleep 1
this will tell you the number of user-space cycles and
instructions executed, per second. The ':u' attribute to limit
to user-space cycles filters apart the IRQ handler overhead from
your user-space cycle soaker.
This number of 'available user-space performance' should not get
worse when you switch from single-CPU APIC target to a
harware-round-robin target mask. You can switch the mask using
/proc/irq/nr/smp_affinity with very low overhead, while all the
above masurements are running - this allows you to see how
user-space throughput reacts to the IRQ details.
Double check that the irq rate is constant, via 'vmstat 1'.
Thanks,
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists