linux-kernel - Re: Fix powerTOP regression with 2.6.39-rc5

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110511215111.GA16355@elte.hu>
Date:	Wed, 11 May 2011 23:51:11 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	Steven Rostedt <rostedt@...dmis.org>
Cc:	David Sharp <dhsharp@...gle.com>,
	Vaibhav Nagarnaik <vnagarnaik@...gle.com>,
	Michael Rubin <mrubin@...gle.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Arjan van de Ven <arjan@...ux.intel.com>,
	linux-kernel <linux-kernel@...r.kernel.org>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Thomas Gleixner <tglx@...utronix.de>,
	Christoph Hellwig <hch@...radead.org>,
	Arnd Bergmann <arnd@...db.de>
Subject: Re: Fix powerTOP regression with 2.6.39-rc5


* Steven Rostedt <rostedt@...dmis.org> wrote:

> [root@bxf perf]# ~/bin/perf record -a -e 'syscalls:*'
> 
>   Error: sys_perf_event_open() syscall returned with 24 (Too many open files).  /bin/dmesg may provide additional information.
> 
>   Fatal: No CONFIG_PERF_EVENTS=y kernel support configured?

Yeah, this is a known bug, have you seen Peter's patch that addresses this?

People who run into this bug will go the way of least resistence: not fix it 
and use ftrace. This is sadly how 'splitting a small pond into two' tends to 
work out in practice: both halves stink a little bit more than they would if 
they were kept together ;-)

This is why lttng as a separate project within the kernel was and is a bad idea 
IMO.

I think this further strengthens the idea that we should join stuff and not 
keep it split!

> > I really meant it when i told you that perf events were the natural next 
> > step after ftrace, in the evolution of Linux tracing/instrumentation.
> 
> I know you meant that, but I don't see nor feel it myself. [...]

My position is very simple: right now we have two tracing tools while for many 
years we (including you!) always worked hard to have unified infrastructure.

For years ftrace was maintained and pushed upstream optimistically on the 
assumption that we are reasonable people who can agree on technical solutions 
objectively.

My technical point, at its core, is even simpler:

 - If the ftrace UI/API/ABI design is better then perf can be migrated to it 
   and we can use the ftrace APIs to do more tooling goodness.
   Everyone will be happy.

 - If the perf UI/API/ABI design is better then ftrace can be migrated to it
   and we can use the perf APIs to do more tooling goodness.
   Everyone will be happy.

 - If we do neither we will have continued tooling badness, tooling pain and
   kernel-churn-without-a-clear-purpose. I will be sad.

Call me an egoist but i do not like being sad, i'd like to see one of the 
options implemented where everyone is happy! :-)

So we could really have a dedicated tracing tool that can do what ftrace and 
perf trace can do and much more. I fully expect that it would have an ftrace 
work-alike workflow.

What we do not want is the current nightmare-ish design and schizm that we have 
two different tracers and two different APIs trying to do the same thing 
really. And that's been going on for two and a half years and counting and i do 
not see much progress there so i'm getting worried about it ...

> [...] Maybe I'm mistaken but I don't have the belief that I can just jump on 
> faith into perf and abandon all the work of current ftrace. But I'm happy to 
> help unify the kernel infrastructure. That is the important part.

Well, nobody suggests any extreme of immediately 'throwing away everything', 
especially as there's no clear replacement, why would we want to do that?

But at least having a very specific *idea* how to bring the two tracing tools 
together quickly, and doing the first steps towards that, after a painful 
period of 2.5 years, looks pretty essential to me.

I'd like to see the tracing pond grow, not fragment. Shrinking it by 10% to 90% 
in the first step would still be much better if it can then have the focus and 
clarity to grow to 300% or more - opposed to splitting it into two 50% parts 
and see both halves rot in their own unique ways! :-)

> > Why not use the correctly designed tracing approach and enhance it, and 
> > merge all the remaining useful bits of ftrace into it?
> 
> The problem we have is that we disagree on what a correctly designed tracing 
> approach is. Tracing is one of those things that everyone has a different 
> idea of what is important. As you stated, you do not care about 4 bytes in an 
> event. If you have 4 million events that is 4 million bytes. A typical event 
> size could be 20 bytes, that 4 bytes is 1/5th of the event that is wasted 
> space.

Well, look at the context:

 - In the context of useful tools like PowerTop, which is driving *tons* of 
   useful new code upstream, 4 bytes is very little cost. It strongly filters 
   events to not be too intrusive to the system to begin with.

 - In the context of perf record/report, which easily receives millions of
   events, 4 bytes is still not measurable overhead.

 - In the context of tracing workflows where you generate hundreds of millions
   of events in a short timespan and store the stream as-is as gigabytes of
   data, 4 bytes is probably measurable overhead.

So yes, there are definitely contexts/niches where 4 bytes are probably 
measurable, but if weighted against the regression of *PowerTop* the cost is 
negligible and it's not even a question which way we want to lean.

Also note that regardless of how tracing will look like in two years time, the 
no regressions policy will always have *way* higher priority than any 
micro-cost concerns.

Note that we are in fact are happy that applications use us, we are *happy* 
that they do indeed *break* if we didnt continue to do the goodness that we are 
doing today.

Consider the alternative: if we did things that no app and no developer is 
interested in. It would just not matter to anyone. We could break it freely, 
nobody would give a damn.

So i really prefer the 'apps are using us' situation we are in today, and not 
breaking them is a *small* price to pay and it is a very small loss of the near 
infinite degrees of development freedom we still enjoy in the kernel.

Also note that IMO there is no long-term technical problem really: i agree with 
you that we can eventually get rid of the 4 bytes bkl field as well, if all 
affected apps migrate to libperf.so in an orderly fashion.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/