[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7c86c4470811261224k20ae2554m32af5504488664cf@mail.gmail.com>
Date: Wed, 26 Nov 2008 21:24:59 +0100
From: "stephane eranian" <eranian@...glemail.com>
To: "Andi Kleen" <andi@...stfloor.org>
Cc: linux-kernel@...r.kernel.org, akpm@...ux-foundation.org,
mingo@...e.hu, x86@...nel.org, sfr@...b.auug.org.au
Subject: Re: [patch 23/24] perfmon: kernel documentation
Andi,
On Wed, Nov 26, 2008 at 8:34 PM, Andi Kleen <andi@...stfloor.org> wrote:
> On Wed, Nov 26, 2008 at 07:21:56PM +0100, stephane eranian wrote:
>> Andi,
>>
>> On Wed, Nov 26, 2008 at 1:21 PM, Andi Kleen <andi@...stfloor.org> wrote:
>> > On Wed, Nov 26, 2008 at 12:43:00AM -0800, eranian@...glemail.com wrote:
>> >
>> > I assume you'll be also submitting manpages with the same information?
>> >
>> This is on my TODO list. Provide a man page for each new syscall.
>
> There should be a overview manpage as well.
>
Yes.
>> I have never played with that myself, even with regular file
>> descriptors. But I can only
>> assume passing a file descriptor increments its refcount. Thus you
>> simply get another
>> controlling process. There is enough context locking in place in the
>> kernel to make this
>> work.
>
> Ok as long as it isn't a root hole or similar.
>
I need to figure out how you actually pass a fd form one process to another.
I seem to remember you need a pipe or socket + some ioctl().
>> > ...
>> >
>> > Some simple syscall examples would be nice. e.g. how to set up a counter
>> > that it can be accessed using RDPMC on x86.
>>
>> I can add this. But why go straight to RDPMC. Most people would want to use
>> the syscall instead?
>
> On recent Intel x86 a common simple useful case is to just use RDPMC
> with one of the fixed counters, especially the unscaled cycle counter.
> The only change needed here is to set the CR bit.
>
Well, you also need to set the FIXED_CTRL + GLOBAL_ENABLE + CR4.pce.
But then, there is one issue with RDPMC which is not clearly stated in the SDM
if I recall. Take Core 2, counters are 40 bits, thus RDPMC returns 40-bit worth
of data. But wrmsrl() can only set the bottom 32 bits. Bits 32-39 are
sign extension
of bit 31. Thus, you may need some masking in case the counter is high. On
Intel processors, perfmon considers that all counters are actually 31-bit wide
(bits 32 and up are always set) and they are all virtualized to 64-bit via the
overflow interrupt. The issue with RDPMC vs. wrmsrl() is important in per-thread
mode because on context switch we may have to restore the counter.
>> > to let a driver patch for that adjust it.
>> >
>> It depends on the number of registers available. It is expected that most tools
>> will want to use one call to program the config registers and one to program
>> the data registers. Pfmon is able to split vectors according to arg_mem_max.
>>
>> It is anticipated that newer processors will increase the number of available
>> PMU registers. That was the case with Barcelona with the addition of IBS.
>> On Intel X86, I am planning on exposing the LBR as part of the PMU registers.
>>
>> On Itanium, you already have 35 data and 27 config registers.
>
> That is still far less than a 4K page. Also 4K worth of registers would
> be a lot. I doubt that will be hit anytime soon.
>
Well, that's because you are looking at the minimal pfarg_pmr_t structure.
But once we had sampling, a new structure is introduced and it contains a
couple of bitmasks and the size is fairly big, 208 bytes on X86, or 19
registers.
>> But I think your suggestion is interesting. When we "register" the new PMU
>> mapping table, we can provide a minimal size to fit all PMC or all PMD registers
>> in one call. That would remove a control point for the sysadmin, though.
>
> I don't think the sysadmin wants to really know about that.
>
If we all agree on this, I can have the kernel adjust the limit based
on the number
of registers. We would not necessarily need to expose that limit in
/sys, if we assume
that tools will never try to pass vector with more entries than there
are registers. And if
they do, the call will fail.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists