[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <702802d2-7318-4575-81ac-4fad6f8ff42f@linux.intel.com>
Date: Wed, 12 Mar 2025 15:41:06 -0400
From: "Liang, Kan" <kan.liang@...ux.intel.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: mingo@...hat.com, tglx@...utronix.de, bp@...en8.de, acme@...nel.org,
namhyung@...nel.org, irogers@...gle.com, linux-kernel@...r.kernel.org,
ak@...ux.intel.com, eranian@...gle.com
Subject: Re: [PATCH V8 1/6] perf: Save PMU specific data in task_struct
On 2025-03-12 3:05 p.m., Peter Zijlstra wrote:
>
> I'm sorry, but since I spotted a bug in the second patch, I'm going to
> reply and suggest some overall changes.
Sure. Thanks.
>
> On Wed, Mar 12, 2025 at 11:25:20AM -0700, kan.liang@...ux.intel.com wrote:
>
>> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
>> index 3e270822b915..b8442047a2b6 100644
>> --- a/include/linux/perf_event.h
>> +++ b/include/linux/perf_event.h
>> @@ -1021,6 +1021,36 @@ struct perf_event_context {
>> local_t nr_no_switch_fast;
>> };
>>
>> +/**
>> + * struct perf_ctx_data - PMU specific data for a task
>> + * @rcu_head: To avoid the race on free PMU specific data
>> + * @refcount: To track users
>> + * @global: To track system-wide users
>> + * @ctx_cache: Kmem cache of PMU specific data
>> + * @data: PMU specific data
>> + *
>> + * Currently, the struct is only used in Intel LBR call stack mode to
>> + * save/restore the call stack of a task on context switches.
>> + * The data only be allocated when Intel LBR call stack mode is enabled.
>> + * The data will be freed when the mode is disabled. The rcu_head is
>> + * used to prevent the race on free the data.
>> + * The content of the data will only be accessed in context switch, which
>> + * should be protected by rcu_read_lock().
>> + *
>> + * Careful: Struct perf_ctx_data is added as a pointor in struct task_struct.
>
> pointer
>
>> + * When system-wide Intel LBR call stack mode is enabled, a buffer with
>> + * constant size will be allocated for each task.
>> + * Also, system memory consumption can further grow when the size of
>> + * struct perf_ctx_data enlarges.
>> + */
>> +struct perf_ctx_data {
>> + struct rcu_head rcu_head;
>> + refcount_t refcount;
>> + int global;
>> + struct kmem_cache *ctx_cache;
>> + void *data;
>> +};
>
> I can't remember why this is complicated like this. Why do we have a
> kmemcache and yet another data pointer in there?
The kmem_cache is introduced to address the alignment requirement for
Arch LBR.
https://lore.kernel.org/lkml/159420190705.4006.11190540790919295173.tip-bot2@tip-bot2/
When users do system-wide profiling, perf has to allocate a buffer when
forking a thread or delete a buffer when deleting a thread. The
pmu->task_ctx_cache is required. Perf has to search the perf_event_list
every time to find the proper PMU.
So the *ctx_cache is introduced to avoid the search.
Thanks,
Kan
>
> Specifically, why can't we do something like:
>
> struct perf_ctx_data {
> struct rcu_head rcu;
> refcount_t refcount;
> int global;
> char data[];
> };
>
> and simply allocate the whole thing as a single allocation?
>
> So then the allocation is something like:
>
> cd = kzalloc(sizeof(*cd) + event->pmu->task_ctx_size, GFP_KERNEL);
>
>
Powered by blists - more mailing lists