[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.10.1612241747170.32409@vshiva-Udesk>
Date: Sat, 24 Dec 2016 17:51:14 -0800 (PST)
From: Shivappa Vikas <vikas.shivappa@...el.com>
To: Peter Zijlstra <peterz@...radead.org>
cc: Shivappa Vikas <vikas.shivappa@...el.com>,
Vikas Shivappa <vikas.shivappa@...ux.intel.com>,
linux-kernel@...r.kernel.org, x86@...nel.org, tglx@...utronix.de,
ravi.v.shankar@...el.com, tony.luck@...el.com,
fenghua.yu@...el.com, andi.kleen@...el.com, davidcc@...gle.com,
eranian@...gle.com, hpa@...or.com
Subject: Re: [PATCH 01/14] x86/cqm: Intel Resource Monitoring Documentation
On Fri, 23 Dec 2016, Peter Zijlstra wrote:
> On Fri, Dec 23, 2016 at 11:35:03AM -0800, Shivappa Vikas wrote:
>>
>> Hello Peterz,
>>
>> On Fri, 23 Dec 2016, Peter Zijlstra wrote:
>>
>>> On Fri, Dec 16, 2016 at 03:12:55PM -0800, Vikas Shivappa wrote:
>>>> +Continuous monitoring
>>>> +---------------------
>>>> +A new file cont_monitoring is added to perf_cgroup which helps to enable
>>>> +cqm continuous monitoring. Enabling this field would start monitoring of
>>>> +the cgroup without perf being launched. This can be used for long term
>>>> +light weight monitoring of tasks/cgroups.
>>>> +
>>>> +To enable continuous monitoring of cgroup p1.
>>>> +#echo 1 > /sys/fs/cgroup/perf_event/p1/perf_event.cqm_cont_monitoring
>>>> +
>>>> +To disable continuous monitoring of cgroup p1.
>>>> +#echo 0 > /sys/fs/cgroup/perf_event/p1/perf_event.cqm_cont_monitoring
>>>> +
>>>> +To read the counters at the end of monitoring perf can be used.
>>>> +
>>>> +LAZY and NOLAZY Monitoring
>>>> +--------------------------
>>>> +LAZY:
>>>> +By default when monitoring is enabled, the RMIDs are not allocated
>>>> +immediately and allocated lazily only at the first sched_in.
>>>> +There are 2-4 RMIDs per logical processor on each package. So if a dual
>>>> +package has 48 logical processors, there would be upto 192 RMIDs on each
>>>> +package = total of 192x2 RMIDs.
>>>> +There is a possibility that RMIDs can runout and in that case the read
>>>> +reports an error since there was no RMID available to monitor for an
>>>> +event.
>>>> +
>>>> +NOLAZY:
>>>> +When user wants guaranteed monitoring, he can enable the 'monitoring
>>>> +mask' which is basically used to specify the packages he wants to
>>>> +monitor. The RMIDs are statically allocated at open and failure is
>>>> +indicated if RMIDs are not available.
>>>> +
>>>> +To specify monitoring on package 0 and package 1:
>>>> +#echo 0-1 > /sys/fs/cgroup/perf_event/p1/perf_event.cqm_mon_mask
>>>> +
>>>> +An error is thrown if packages not online are specified.
>>>
>>> I very much dislike both those for adding files to the perf cgroup.
>>> Drivers should really not do that.
>>
>> Is the continuous monitoring the issue or the interface (adding a file in
>> perf_cgroup) ? I have not mentioned in the documentaion but this continuous
>> monitoring/ monitoring mask applies only to cgroup in this patch and hence
>> we thought a good place for that is in the cgroup itself because its per
>> cgroup.
>>
>> For task events , this wont apply and we are thinking of providing a prctl
>> based interface for user to toggle the continous monitoring ..
>
> More fail..
>
>>>
>>> I absolutely hate the second because events already have affinity.
>>
>> This applies to continuous monitoring as well when there are no events
>> associated. Meaning if the monitoring mask is chosen and user tries to
>> enable continuous monitoring using the cgrp->cont_mon - all RMIDs are
>> allocated immediately. the mon_mask provides a way for the user to have
>> guarenteed RMIDs for both that have events and for continoous monitoring(no
>> perf event associated) (assuming user uses it when user knows he would
>> definitely use it.. or else there is LAZY mode)
>>
>> Again this is cgroup specific and wont apply to task events and is needed
>> when there are no events associated.
>
> So no, the problem is that a driver introduces special ABI and behaviour
> that radically departs from the regular behaviour.
Ok , looks like the interface is the problem. Will try to
fix this. We are just trying to have a light weight monitoring
option so that its reasonable to monitor for a
very long time (like lifetime of process etc). Mainly to not have all the perf
scheduling overhead.
May be a perf event attr option is a more reasonable approach for the user to
choose the option ? (rather than some new interface like prctl / cgroup file..)
Thanks,
Vikas
Powered by blists - more mailing lists