[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090701112007.GD15958@elte.hu>
Date: Wed, 1 Jul 2009 13:20:07 +0200
From: Ingo Molnar <mingo@...e.hu>
To: Jaswinder Singh Rajput <jaswinder@...nel.org>,
Arjan van de Ven <arjan@...radead.org>,
Paul Mackerras <paulus@...ba.org>,
Benjamin Herrenschmidt <benh@...nel.crashing.org>,
Anton Blanchard <anton@...ba.org>
Cc: Thomas Gleixner <tglx@...utronix.de>,
Peter Zijlstra <peterz@...radead.org>,
x86 maintainers <x86@...nel.org>,
LKML <linux-kernel@...r.kernel.org>,
Alan Cox <alan@...rguk.ukuu.org.uk>
Subject: Re: [PATCH 3/6 -tip] perf_counter: Add Generalized Hardware
vectored co-processor support for AMD
* Jaswinder Singh Rajput <jaswinder@...nel.org> wrote:
> $ ./perf stat -e add -e multiply -e divide -e vec-idle-cycles -e vec-stall-cycles -e vec-ops -- /usr/bin/vlc ~jaswinder/Videos/Linus_Torvalds_interview_with_Charlie_Rose_Part_1.flv
>
> Performance counter stats for '/usr/bin/vlc /home/jaswinder/Videos/Linus_Torvalds_interview_with_Charlie_Rose_Part_1.flv':
>
> 20177177044 vec-adds (scaled from 66.63%)
> 34101687027 vec-muls (scaled from 66.64%)
> 3984060862 vec-divs (scaled from 66.71%)
> 26349684710 vec-idle-cycles (scaled from 66.65%)
> 9052001905 vec-stall-cycles (scaled from 66.66%)
> 76440734242 vec-ops (scaled from 66.71%)
>
> 272.523058097 seconds time elapsed
Ok, this looks very nice now - a highly generic and still very
useful looking categorization of FPU/MMX/SSE related co-processor hw
events.
I'm still waiting for feedback from Paulus, BenH and Anton, whether
this kind of generic enumeration fits PowerPC well enough.
I think from a pure logic/math/physics POV this categorization is
pretty complete: a modern co-processor has three fundamental states
we are interested in: idle, busy and busy-stalled. It has an 'ops'
metric that counts instructions, plus the main operations are add,
mul and div.
Cell is i guess a complication to be solved, as there the various
vector units have separate decoders and separate thread state. This
above abstraction only covers the portion of CPU designs where there
are vector operations in the main ALU decoder stream of instructions
One thing that might be worth exposing is vectored loads/stores in
general. But we dont have those in the generic ALU enumeration yet
and if then it should be done together.
Also, the Nehalem bits need to be tested, i'll try to find time for
that.
Good stuff.
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists