[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <AANLkTimZrFv7YWOdy6dvtbzFmwB85FyPntZULmN0f6VN@mail.gmail.com>
Date: Mon, 17 May 2010 16:25:26 +0200
From: Stephane Eranian <eranian@...gle.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: LKML <linux-kernel@...r.kernel.org>,
Frédéric Weisbecker <fweisbec@...il.com>,
Arnaldo Carvalho de Melo <acme@...radead.org>,
mingo@...e.hu, Paul Mackerras <paulus@...ba.org>,
"David S. Miller" <davem@...emloft.net>,
perfmon2-devel@...ts.sf.net
Subject: Re: [RFC] perf: perf record sets inherit by default
On Tue, May 11, 2010 at 4:48 PM, Peter Zijlstra <peterz@...radead.org> wrote:
> On Tue, 2010-05-11 at 16:04 +0200, Stephane Eranian wrote:
>> Hi,
>>
>>
>> I am confused by the inheritance cmd line option of perf record:
>>
>> $ perf record -h
>> usage: perf record [<options>] [<command>]
>> or: perf record [<options>] -- <command> [<options>]
>>
>> -e, --event <event> event selector. use 'perf list' to list
>> available events
>> --filter <filter>
>> event filter
>> -p, --pid <n> record events on existing process id
>> -t, --tid <n> record events on existing thread id
>> -r, --realtime <n> collect data with this RT SCHED_FIFO priority
>> -R, --raw-samples collect raw sample records from all opened counters
>> -a, --all-cpus system-wide collection from all CPUs
>> -A, --append append to the output file to do incremental profiling
>> -C, --profile_cpu <n>
>> CPU to profile on
>> -f, --force overwrite existing data file (deprecated)
>> -c, --count event period to sample
>> -o, --output <file> output file name
>> -i, --inherit child tasks inherit counters
>>
>> This leads to believe that by default inheritance in children is off.
>>
>> However, builtin-record.c says:
>>
>> static bool inherit = true;
>>
>> If that's the case, what's the point of the -i option?
>
> Right, I think we should invert that, does --no-inherit work?
>
>> Another side effect of inheritance is that in per-thread mode,
>> perf creates as many "sessions" as you have CPUs. So
>> on a 16-way processor, sampling on cycles, perf creates
>> 16 events and 16 x 2-page sampling buffers. That's a lot of
>> resources consumed if I am just interested in monitoring
>> a single-threaded workload.
>
> Right, but I think the default of inherit is right, and once you do that
> you basically have to do the per-task-per-cpu thing, otherwise your
> fancy 16-way will start spending most of its time in cacheline bounces.
>
In that case, don't you think you should also ensure that the buffer is
allocated on the NUMA node of the designated per-thread-per-cpu?
I don't think it is the case today.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists