lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <c62985530910050308m67ae892fx61c9fb6a2793e54f@mail.gmail.com> Date: Mon, 5 Oct 2009 12:08:55 +0200 From: Frédéric Weisbecker <fweisbec@...il.com> To: Ingo Molnar <mingo@...e.hu> Cc: Peter Zijlstra <peterz@...radead.org>, "K.Prasad" <prasad@...ux.vnet.ibm.com>, Arjan van de Ven <arjan@...radead.org>, "Frank Ch. Eigler" <fche@...hat.com>, linux-kernel@...r.kernel.org Subject: Re: [RFC PATCH] perf_core: provide a kernel-internal interface to get to performance counters Le 5 octobre 2009 11:48, Ingo Molnar <mingo@...e.hu> a écrit : > > * Frédéric Weisbecker <fweisbec@...il.com> wrote: > >> 2009/10/5 Ingo Molnar <mingo@...e.hu>: >> > >> > * Peter Zijlstra <peterz@...radead.org> wrote: >> >> Non-trivial. >> >> >> >> Something like this would imply a single output channel for all these >> >> CPUs, and we've already seen that stuffing too many CPUs down one such >> >> channel (using -M) leads to significant performance issues. >> > >> > We could add internal per cpu buffering before it hits any globally >> > visible output channel. (That has come up when i talked to Frederic >> > about the function tracer.) We could even have page sized output >> > (via the introduction of a NOP event that fills up to the next page >> > edge). >> >> That looks good for the counting/sampling fast path, but would that >> scale once it comes to reordering in the globally visible output >> channel? Such a union has its costs. > > Well, reordering always has a cost, and we have multiple models > regarding to where to put that cost. > > The first model is 'everything is per cpu' - i.e. completely separate > event buffers and the reordering is pushed to the user-space > post-processing stage. This is the most scalable solution - but it can > also lose information such as the true ordering of events. > > The second model is 'event multiplexing' - here we use a single output > buffer for events. This serializes all output on the same buffer and > hence is the least scalable one. It is the easiest to use one: just a > single channel of output to deal with. It is also the most precise > solution and it saves the post-processing stage from reordering hassles. > > What i suggested above is a third model: 'short-term per cpu, > multiplexed into an output channel with page granularity'. It has the > advantage of being per cpu on a page granular basis. It has the ease of > use of having a single output channel only. > > Neither solution can eliminate the costs and tradeoffs involved. What > they do is to offer an app a spectrum to choose from. > > Ingo > Ok. The third solution solves the multi-channel problem, and for the ordering...well as you said, everything has a cost. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists