lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c62985530910050308m67ae892fx61c9fb6a2793e54f@mail.gmail.com>
Date:	Mon, 5 Oct 2009 12:08:55 +0200
From:	Frédéric Weisbecker <fweisbec@...il.com>
To:	Ingo Molnar <mingo@...e.hu>
Cc:	Peter Zijlstra <peterz@...radead.org>,
	"K.Prasad" <prasad@...ux.vnet.ibm.com>,
	Arjan van de Ven <arjan@...radead.org>,
	"Frank Ch. Eigler" <fche@...hat.com>, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH] perf_core: provide a kernel-internal interface to get 
	to performance counters

Le 5 octobre 2009 11:48, Ingo Molnar <mingo@...e.hu> a écrit :
>
> * Frédéric Weisbecker <fweisbec@...il.com> wrote:
>
>> 2009/10/5 Ingo Molnar <mingo@...e.hu>:
>> >
>> > * Peter Zijlstra <peterz@...radead.org> wrote:
>> >> Non-trivial.
>> >>
>> >> Something like this would imply a single output channel for all these
>> >> CPUs, and we've already seen that stuffing too many CPUs down one such
>> >> channel (using -M) leads to significant performance issues.
>> >
>> > We could add internal per cpu buffering before it hits any globally
>> > visible output channel. (That has come up when i talked to Frederic
>> > about the function tracer.) We could even have page sized output
>> > (via the introduction of a NOP event that fills up to the next page
>> > edge).
>>
>> That looks good for the counting/sampling fast path, but would that
>> scale once it comes to reordering in the globally visible output
>> channel? Such a union has its costs.
>
> Well, reordering always has a cost, and we have multiple models
> regarding to where to put that cost.
>
> The first model is 'everything is per cpu' - i.e. completely separate
> event buffers and the reordering is pushed to the user-space
> post-processing stage. This is the most scalable solution - but it can
> also lose information such as the true ordering of events.
>
> The second model is 'event multiplexing' - here we use a single output
> buffer for events. This serializes all output on the same buffer and
> hence is the least scalable one. It is the easiest to use one: just a
> single channel of output to deal with. It is also the most precise
> solution and it saves the post-processing stage from reordering hassles.
>
> What i suggested above is a third model: 'short-term per cpu,
> multiplexed into an output channel with page granularity'. It has the
> advantage of being per cpu on a page granular basis. It has the ease of
> use of having a single output channel only.
>
> Neither solution can eliminate the costs and tradeoffs involved. What
> they do is to offer an app a spectrum to choose from.
>
>        Ingo
>

Ok. The third solution solves the multi-channel problem, and for the
ordering...well
as you said, everything has a cost.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ