linux-kernel - Re: [Patch v5 0/6] Bug fixes on topdown events reordering

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <b0695ef6-8a59-4550-8a33-9afb25c93f48@linux.intel.com>
Date: Thu, 3 Oct 2024 10:57:11 -0400
From: "Liang, Kan" <kan.liang@...ux.intel.com>
To: Ian Rogers <irogers@...gle.com>, Namhyung Kim <namhyung@...nel.org>
Cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>,
 Arnaldo Carvalho de Melo <acme@...nel.org>,
 Adrian Hunter <adrian.hunter@...el.com>,
 Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
 Dapeng Mi <dapeng1.mi@...ux.intel.com>, linux-perf-users@...r.kernel.org,
 linux-kernel@...r.kernel.org, Yongwei Ma <yongwei.ma@...el.com>,
 Dapeng Mi <dapeng1.mi@...el.com>
Subject: Re: [Patch v5 0/6] Bug fixes on topdown events reordering



On 2024-10-02 8:57 p.m., Ian Rogers wrote:
> On Wed, Oct 2, 2024 at 5:00 PM Namhyung Kim <namhyung@...nel.org> wrote:
>>
>> On Tue, Oct 01, 2024 at 03:32:04PM -0700, Ian Rogers wrote:
>>> On Tue, Oct 1, 2024 at 2:02 PM Namhyung Kim <namhyung@...nel.org> wrote:
>>>>
>>>> On Fri, 13 Sep 2024 08:47:06 +0000, Dapeng Mi wrote:
>>>>
>>>>> Changes:
>>>>> v5 -> v6:
>>>>>   * no function change.
>>>>>   * rebase patchset to latest code of perf-tool-next tree.
>>>>>   * Add Kan's reviewed-by tag.
>>>>>
>>>>> History:
>>>>>   v4: https://lore.kernel.org/all/20240816122938.32228-1-dapeng1.mi@linux.intel.com/
>>>>>   v3: https://lore.kernel.org/all/20240712170339.185824-1-dapeng1.mi@linux.intel.com/
>>>>>   v2: https://lore.kernel.org/all/20240708144204.839486-1-dapeng1.mi@linux.intel.com/
>>>>>   v1: https://lore.kernel.org/all/20240702224037.343958-1-dapeng1.mi@linux.intel.com/
>>>>>
>>>>> [...]
>>>>
>>>> Applied to perf-tools-next, thanks!
>>>
>>> I disagreed with an early patch set and the issue wasn't resolved. Specifically:
>>>
>>> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/commit/?h=perf-tools-next&id=3b5edc0421e2598a0ae7f0adcd592017f37e3cdf
>>> ```
>>>   /* Followed by topdown events. */
>>>   if (arch_is_topdown_metrics(lhs) && !arch_is_topdown_metrics(rhs))
>>>   return -1;
>>> - if (!arch_is_topdown_metrics(lhs) && arch_is_topdown_metrics(rhs))
>>> + /*
>>> + * Move topdown events forward only when topdown events
>>> + * are not in same group with previous event.
>>> + */
>>> + if (!arch_is_topdown_metrics(lhs) && arch_is_topdown_metrics(rhs) &&
>>> +     lhs->core.leader != rhs->core.leader)
>>>   return 1;
>>> ```
>>> Is a broken comparator as the lhs then rhs behavior varies from the
>>> rhs then lhs behavior. The qsort implementation can randomly order the
>>> events.
>>> Please drop/revert.
>>
>> Can you please provide an example when it's broken?  I'm not sure how it
>> can produce new errors, but it seems to fix a specific problem.  Do you
>> have a new test failure after this change?
> 
> It may work with a particular sort routine in a particular library
> today, the issue is that if the sort routine were say changed from:
> 
> if (cmp(a, b) < 0)
> 
> to:
> 
> if (cmp(b, a) > 0)
> 
> the sort would vary with this change even though such a change in the
> sorting code is a no-op. It is fundamental algorithms that this code
> is broken, I'm not going to spend the time finding a list of
> instructions and a sort routine to demonstrate this.


I'm not sure I fully understand. But just want to mention that the
change only impacts the Topdown perf metric group, which is only
available for the ICL and later p-core. Nothing will change if no perf
metrics topdown events are used. Very limited impact.

I like the patch set is because it provides examples and tests to cover
the possible combination of the slots event and topdown metrics events.
So we will have a clear expectation for different combinations caused by
the perf metrics mess.

If the algorithms cannot be changed, can you please give some
suggestions, especially for the sample read failure?

Thanks,
Kan