linux-kernel - Re: [PATCH v4] tools/perf/metricgroup: Fix printing event names of metric group with multiple events incase of overlapping events

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <648e08fb-ffd2-16f6-005d-527ac3fee6c0@linux.ibm.com>
Date:   Tue, 24 Mar 2020 12:43:32 +0530
From:   kajoljain <kjain@...ux.ibm.com>
To:     Joakim Zhang <qiangqing.zhang@....com>,
        "acme@...nel.org" <acme@...nel.org>, Jiri Olsa <jolsa@...nel.org>,
        Andi Kleen <ak@...ux.intel.com>
Cc:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-perf-users@...r.kernel.org" <linux-perf-users@...r.kernel.org>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Kan Liang <kan.liang@...ux.intel.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Jin Yao <yao.jin@...ux.intel.com>,
        Madhavan Srinivasan <maddy@...ux.vnet.ibm.com>,
        Anju T Sudhakar <anju@...ux.vnet.ibm.com>,
        Ravi Bangoria <ravi.bangoria@...ux.ibm.com>
Subject: Re: [PATCH v4] tools/perf/metricgroup: Fix printing event names of
 metric group with multiple events incase of overlapping events



On 3/20/20 12:17 PM, Joakim Zhang wrote:
> 
> [...]
>>>>> Hi Kajol,
>>>>>
>>>>> I am not sure if it is good to ask a question here :-)
>>>>>
>>>>> I encountered a perf metricgroup issue, the result is incorrect when
>>>>> the
>>>> metric includes more than 2 events.
>>>>>
>>>>> git log --oneline tools/perf/util/metricgroup.c
>>>>> 3635b27cc058 perf metricgroup: Fix printing event names of metric
>>>>> group with multiple events f01642e4912b perf metricgroup: Support
>>>>> multiple events for metricgroup
>>>>> 287f2649f791 perf metricgroup: Scale the metric result
>>>>>
>>>>> I did a simple test, below is the JSON file and result.
>>>>> [
>>>>>         {
>>>>>              "PublicDescription": "Calculate DDR0 bus actual
>>>>> utilization
>>>> which vary from DDR0 controller clock frequency",
>>>>>              "BriefDescription": "imx8qm: ddr0 bus actual utilization",
>>>>>              "MetricName": "imx8qm-ddr0-bus-util",
>>>>>              "MetricExpr": "( imx8_ddr0\\/read\\-cycles\\/ +
>>>> imx8_ddr0\\/write\\-cycles\\/ )",
>>>>>              "MetricGroup": "i.MX8QM_DDR0_BUS_UTIL"
>>>>>         }
>>>>> ]
>>>>> ./perf stat -I 1000 -M imx8qm-ddr0-bus-util
>>>>> #           time             counts unit events
>>>>>      1.000104250              16720      imx8_ddr0/read-cycles/
>>>> #  22921.0 imx8qm-ddr0-bus-util
>>>>>      1.000104250               6201
>> imx8_ddr0/write-cycles/
>>>>>      2.000525625               8316      imx8_ddr0/read-cycles/
>>>> #  12785.5 imx8qm-ddr0-bus-util
>>>>>      2.000525625               2738
>> imx8_ddr0/write-cycles/
>>>>>      3.000819125               1056      imx8_ddr0/read-cycles/
>>>> #   4136.7 imx8qm-ddr0-bus-util
>>>>>      3.000819125                303
>> imx8_ddr0/write-cycles/
>>>>>      4.001103750               6260      imx8_ddr0/read-cycles/
>>>> #   9149.8 imx8qm-ddr0-bus-util
>>>>>      4.001103750               2317
>> imx8_ddr0/write-cycles/
>>>>>      5.001392750               2084      imx8_ddr0/read-cycles/
>>>> #   4516.0 imx8qm-ddr0-bus-util
>>>>>      5.001392750                601
>> imx8_ddr0/write-cycles/
>>>>>
>>>>> You can see that only the first result is correct, could this be
>>>>> reproduced at
>>>> you side?
>>>>
>>>> Hi Joakim,
>>>>         Will try to look into it from my side.
>>>
>>
>>> Thanks Kajol for your help, I look into this issue, but don't know how to fix it.
>>>
>>> The results are always correct if signal event used in "MetricExpr" with "-I"
>> parameters, but the results are incorrect when more than one events used in
>> "MetricExpr".
>>>
>>
>> Hi Joakim,
>>     So, I try to look into this issue and understand the flow. From my
>> understanding, whenever we do
>>     calculation of metric expression we don't use exact count we are getting.
>>     Basically we use mean value of each event in the calculation of metric
>> expression.
>>
>> So, I am taking same example you refer.
>>
>> Metric Event: imx8qm-ddr0-bus-util
>> MetricExpr": "( imx8_ddr0\\/read\\-cycles\\/ + imx8_ddr0\\/write\\-cycles\\/ )"
>>
>> command#: ./perf stat -I 1000 -M imx8qm-ddr0-bus-util
>>
>> #           time             counts unit events
>>      1.000104250              16720      imx8_ddr0/read-cycles/
>> #  22921.0 imx8qm-ddr0-bus-util
>>      1.000104250               6201      imx8_ddr0/write-cycles/
>>      2.000525625               8316      imx8_ddr0/read-cycles/
>> #  12785.5 imx8qm-ddr0-bus-util
>>      2.000525625               2738      imx8_ddr0/write-cycles/
>>      3.000819125               1056      imx8_ddr0/read-cycles/
>> #   4136.7 imx8qm-ddr0-bus-util
>>      3.000819125                303      imx8_ddr0/write-cycles/
>>      4.001103750               6260      imx8_ddr0/read-cycles/
>> #   9149.8 imx8qm-ddr0-bus-util
>>      4.001103750               2317      imx8_ddr0/write-cycles/
>>      5.001392750               2084      imx8_ddr0/read-cycles/
>> #   4516.0 imx8qm-ddr0-bus-util
>>      5.001392750                601      imx8_ddr0/write-cycles/
>>
>> If you see we have a function called 'update_stats' in file util/stat.c where we
>> do this calculation and updating stats->mean value. And this mean value is
>> what we are using actually in our metric expression calculation.
>>
>> We call this function in each iteration where we update stats->mean and
>> stats->n for each event.
>> But one weird issue is, for very first event, stat->n is always 1 that is why we
>> are getting mean same as count.
>> So this is the reason for single event you get exact aggregate of metric
>> expression.
>> So doesn't matter how many events you have in your metric expression, every
>> time you take exact count for first one and normalized value for rest which is
>> weird.
>>
>> According to update_stats function:  We are updating mean as:
>>
>> stats->mean += delta / stats->n where,  delta = val - stats->mean.
>>
>> If we take write-cycles here. Initially mean = 0 and n = 1.
>>
>> 1st iteration: n=1, write cycle : 6201 and mean = 6201  (Final agg value: 16720
>> + 6201 = 22921) 2nd iteration: n=2, write cycles:  6201 + (2738 - 6201)/2 =
>> 4469.5  (Final aggr value: 8316 + 4469.5 = 12785.5) 3rd iteration: n=3, write
>> cycles: 4469.5 + (303 - 4469.5)/3 = 3080.6667 (Final aggr value: 1056 +
>> 3080.6667 = 4136.7)
>>
>> Andi and Jiri, I am not sure if its expected behavior. I mean shouldn't we either
>> take mean value of each event or take n as 1 for each event. And one more
>> question, Should we add an option to say whether user want exact aggregate
>> or this normalize aggregate to remove the confusion? I try to find it out if we
>> already have one but didn't get.
>> Please let me know if my understanding is fine.
> 
> Hi Kajol,
> 
> Sorry, your reply was buried in a sea of emails, it comes into my eyes when I searched any feedback from you. Much thanks for your great details!!!!!
> 
> I can quite understand what you explained. As a user, I think we always want to get the exact result according to the metric expression.
> 
> Can you take this case as an example then send out a formal email into mailing list to reflect this weird issue, more people can participate and discuss about it. Or you need me clear up and sent out the email?
> This could attract maintainers' attention.
> 

Hi Joakim,
    Yes you are right, I will send separate mail on lkml. That way it will be better.

Thanks,
Kajol

> Best Regards,
> Joakim Zhang
>