linux-kernel - Re: [PATCH 2/3] x86: x2apic/cluster: Make use of lowest priority delivery mode

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20120521193051.GB28819@gmail.com>
Date:	Mon, 21 May 2012 21:30:51 +0200
From:	Ingo Molnar <mingo@...nel.org>
To:	Suresh Siddha <suresh.b.siddha@...el.com>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Alexander Gordeev <agordeev@...hat.com>,
	Arjan van de Ven <arjan@...radead.org>,
	linux-kernel@...r.kernel.org, x86@...nel.org,
	Cyrill Gorcunov <gorcunov@...nvz.org>,
	Yinghai Lu <yinghai@...nel.org>
Subject: Re: [PATCH 2/3] x86: x2apic/cluster: Make use of lowest priority
 delivery mode

* Suresh Siddha <suresh.b.siddha@...el.com> wrote:

> > But I do agree with Ingo that it would be really good to 
> > actually see numbers (and no, I don't mean "look here, now 
> > the irq's are nicely spread out", but power and/or 
> > performance numbers showing that it actually helps 
> > something).
> 
> I agree. This is the reason why I held up posting these 
> patches before. I can come up with micro-benchmarks that can 
> show some difference but the key is to find good 
> workload/benchmark that can show measurable difference. Any 
> suggestions?

It's rather difficult to measure this reliably. The main 
complication is the inherent noise of cache stats on SMP/NUMA 
systems, which all modern multi-socket systems are ...

But, since you asked, if you can generate a *very* precise 
incoming external IRQ rate, it's possible:

Generate say 10,000 irqs/sec of a workload directed at a single 
CPU - something like multiple copies of ping -i 0.001 -q 
executed on a nearby system might do.

Then run a user-space cycle soaker, nice -19 running NOPs on all 
CPUs. It's important that it *only* a user-space infinite loop, 
with no kernel instructions executed at all - see later.

Then play around with variants of:

  perf stat -a --repeat 10 -e cycles:u -e instructions:u sleep 1

this will tell you the number of user-space cycles and 
instructions executed, per second. The ':u' attribute to limit 
to user-space cycles filters apart the IRQ handler overhead from 
your user-space cycle soaker.

This number of 'available user-space performance' should not get 
worse when you switch from single-CPU APIC target to a 
harware-round-robin target mask. You can switch the mask using 
/proc/irq/nr/smp_affinity with very low overhead, while all the 
above masurements are running - this allows you to see how 
user-space throughput reacts to the IRQ details.

Double check that the irq rate is constant, via 'vmstat 1'.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/