[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <534b4b99-466a-0a5b-e9f5-b4711abd8a4a@linux.intel.com>
Date: Thu, 20 Feb 2020 11:03:35 -0500
From: "Liang, Kan" <kan.liang@...ux.intel.com>
To: Jiri Olsa <jolsa@...hat.com>
Cc: acme@...nel.org, mingo@...hat.com, peterz@...radead.org,
linux-kernel@...r.kernel.org, mark.rutland@....com,
namhyung@...nel.org, ravi.bangoria@...ux.ibm.com,
yao.jin@...ux.intel.com, ak@...ux.intel.com
Subject: Re: [PATCH 0/5] Support metric group constraint
On 2/20/2020 6:39 AM, Jiri Olsa wrote:
> On Wed, Feb 19, 2020 at 11:08:35AM -0800, kan.liang@...ux.intel.com wrote:
>> From: Kan Liang <kan.liang@...ux.intel.com>
>>
>> Some metric groups, e.g. Page_Walks_Utilization, will never count when
>> NMI watchdog is enabled.
>>
>> $echo 1 > /proc/sys/kernel/nmi_watchdog
>> $perf stat -M Page_Walks_Utilization
>>
>> Performance counter stats for 'system wide':
>>
>> <not counted> itlb_misses.walk_pending (0.00%)
>> <not counted> dtlb_load_misses.walk_pending (0.00%)
>> <not counted> dtlb_store_misses.walk_pending (0.00%)
>> <not counted> ept.walk_pending (0.00%)
>> <not counted> cycles (0.00%)
>>
>> 2.343460588 seconds time elapsed
>>
>> Some events weren't counted. Try disabling the NMI watchdog:
>> echo 0 > /proc/sys/kernel/nmi_watchdog
>> perf stat ...
>> echo 1 > /proc/sys/kernel/nmi_watchdog
>> The events in group usually have to be from the same PMU. Try
>> reorganizing the group.
>>
>> A metric group is a weak group, which relies on group validation
>> code in the kernel to determine whether to be opened as a group or
>> a non-group. However, group validation code may return false-positives,
>> especially when NMI watchdog is enabled. (The metric group is allowed
>> as a group but will never be scheduled.)
>>
>> The attempt to fix the group validation code has been rejected.
>> https://lore.kernel.org/lkml/20200117091341.GX2827@hirez.programming.kicks-ass.net/
>> Because we cannot accurately predict whether the group can be scheduled
>> as a group, only by checking current status.
>>
>> This patch set provides another solution to mitigate the issue.
>> Add "MetricConstraint" in event list, which provides a hint for perf tool,
>> e.g. "MetricConstraint": "NO_NMI_WATCHDOG". Perf tool can change the
>> metric group to non-group (standalone metrics) if NMI watchdog is enabled.
>
> the problem is in the missing counter, that's taken by NMI watchdog, right?
>
> and it's problem for any metric that won't fit to the available
> counters.. shouldn't we rather do this workaround for any metric
> that wouldn't fit in available counters?
I think current perf already did this.
All metric groups are weak group. Kernel (validate_group()) tells perf
tool whether a metric group fit to available counters.
If yes, the metric group will be scheduled as a group.
If no, perf tool will not using a group and re-try. The code is as below.
try_again:
if (create_perf_stat_counter(counter, &stat_config, &target) < 0) {
/* Weak group failed. Reset the group. */
if ((errno == EINVAL || errno == EBADF) &&
counter->leader != counter &&
counter->weak_group) {
counter = perf_evlist__reset_weak_group(evsel_list, counter);
goto try_again;
}
This patch set is to workaroud the false-positives from the kernel.
Kernel only validate the group itself. It assumes that all counters are
available. So, when any counter is permanently occupied by others, e.g.
watchdog, the validate_group() may return false-positives.
? not just for chosen
> ones..?
For now, I think we only need to workaround the Page_Walks_Utilization
metric group. Because it has 5 events, and one of the events is cycles.
The cycles has to be scheduled on fixed counter 2. But it's already
occupied by watchdog.
The false-positives of validate_group() will trigger the issue (metric
group never be scheduled.).
For other metric groups, even they have cycles, the issue should not be
triggered.
For example, if they have 4 or less events, the cycles can be scheduled
to GP counter instead.
If they have 6 or more events, the weak group will be reject anyway.
Perf tool will open it as non-group (standalone metrics).
I think we only need to apply the constraint for the
Page_Walks_Utilization metric group for now.
Thanks,
Kan
>
> thanks,
> jirka
>
Powered by blists - more mailing lists