linux-kernel - Re: [PATCH 13/22] 2.6.22-rc3 perfmon2 : common core functions

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20070531204404.GE24230@frankl.hpl.hp.com>
Date:	Thu, 31 May 2007 13:44:04 -0700
From:	Stephane Eranian <eranian@....hp.com>
To:	Andi Kleen <andi@...stfloor.org>
Cc:	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 13/22] 2.6.22-rc3 perfmon2 : common core functions

Andi,

On Thu, May 31, 2007 at 07:28:28PM +0200, Andi Kleen wrote:
> 
> > All the features currently supported are used by some tools and have
> > been requested by users.
> 
> But are they actually all used? Is there something you would
> like to get rid of because it turned out more trouble than gain?
> Now is the time to do it.
> 
I need to take a look but I thikn there are two features I can remove
without too much trouble: counter remapping, explicit set switching.
That would remove some code and should they really be needed in the
future we could add them.

> > > > +#define ulp(_x) ((unsigned long *)_x)
> > > 
> > > Don't use such non standard macros please.
> > 
> > That goes back to the argument earlier about bitmap.h using unsigned long
> > and not u64. To avoid compiler warning, I used this macro. I do want
> > to use the bitmap macros (instead of inventing my own). Andrew once
> > asked if we should not fix bitmap.h instead.
> 
> I mean just use the cast if you mean the cast. That makes the readers'
> life much easier.
The issue is that the code get more difficult to read and format
because of all the (unsigned long *) cast. What about I rename the
macro to cast_ulp() to make this more explicit?

> 
> > There parameter is checked on entry from perfmon syscalls to fail
> > if necessary. So there can potentially be a race with the perfmon init.
> 
> A simple flag is not racy.
> 
I'll fix that.

> > Are you talking about the overflow of the message queue?
> 
> Yes.
> 
No, monitoring will stop when the queue is full.

> > I made sure there was enough features in perfmon to support Oprofile and reuse its
> > kernel code base. The changes to the user level daemon are fairly limited. It was my
> > goal to ensure that Oprofile users would not see major disruptions. The wording is
> > the documentation was maybe too strong.
> 
> I would prefer if oprofile would still work. It doesn't need to work
> in parallel but it should be possible to use it when perfmon
> is not active, but compiled in. That shouldn't be that difficult,
> should it?
> 
What I did (this is how it works on IA-64) is that Oprofile works on top
of perfmon.  Perfmon handles read/write of the PMU registers and low-level
interrupt handling. Higher level interrupt handling, sampling buffers, event
buffers and export of the buffer via /dev/oprofile works like before.
Until the user level oprofile code gets updated to work on perfmon
on x86, we could do what you suggest. Mutual exclusion between the
two subsystem would not be too hard. 

> > > Weird thing to export. That's purely an internal implementation detail isn't it?
> > 
> > Yes, but that could be leveraged by tools to optimize the number of argument they
> > want to pass per call.
> 
> Well is it?  And how would that be implemented? assembler stubs? 
> switch statements? Doesn't make much sense to me.
> 
The pfm_read_Pmds() call takes a vector argument, the /sys info simply reports
the maximum number of entries in the vector that would not cause a kmalloc() 
but instead would be handled on the kernel stack. On x86, it is set to 4.
so if you have 8 register to read, it may be more interesting to make two
calls of 4 entries each. This is purely for optimization.


> > > > +   * smpl_buffer_mem_max(RW): maximum amount of memory usable for sampling buffers.
> > > > +   		-1 means all that is available.
> > > 
> > > -1 seems dangerous. 
> > 
> > I could use a % of available memory. Is there another subsystem that ttries to size
> > its buffer based on available memory at boot time?
> 
> Yes plenty, e.g. for hash tables. But then the heuristics
> tend to be wrong. Just because I have 16GB of RAM doesn't 
> mean I really want 8GB of perfmon tables.
> 
Agreed.

> But how about using mlock per process limits instead?
> 
The kernel level sampling buffer is allocated via kmalloc() when the 
perfmon context is created. The buffer cannot be paged out, so this is 
as if the memory was locked. I do the accounting with RLIMIT_MEMLOCK,
even if the buffer is never explicitly mmapped by the user.
Don't you need some privilege to issue a mlock()?
So in fact you have two levels: a per-process limit controlled
by RLIMIT_MEMLOCK and a system-wide limit controlled by smpl_buffer_mem_max.

-- 
-Stephane
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/