linux-kernel - Re: perf regression. Was: [PATCH V4 01/16] perf: Fix the throttle logic for a group

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAADnVQL_v4SscxVK5fLxKo5Z4+LJtVfpvrJ4+ztu-ecPfxwrhQ@mail.gmail.com>
Date: Mon, 2 Jun 2025 09:24:21 -0700
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: "Liang, Kan" <kan.liang@...ux.intel.com>
Cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>, 
	Namhyung Kim <namhyung@...nel.org>, Ian Rogers <irogers@...gle.com>, 
	Mark Rutland <mark.rutland@....com>, LKML <linux-kernel@...r.kernel.org>, 
	"linux-perf-use." <linux-perf-users@...r.kernel.org>, Stephane Eranian <eranian@...gle.com>, 
	Chun-Tse Shao <ctshao@...gle.com>, Thomas Richter <tmricht@...ux.ibm.com>, Leo Yan <leo.yan@....com>, 
	bpf <bpf@...r.kernel.org>, Andrii Nakryiko <andrii@...nel.org>, 
	Ihor Solodrai <ihor.solodrai@...ux.dev>, Song Liu <song@...nel.org>, Jiri Olsa <jolsa@...nel.org>
Subject: Re: perf regression. Was: [PATCH V4 01/16] perf: Fix the throttle
 logic for a group

On Mon, Jun 2, 2025 at 5:55 AM Liang, Kan <kan.liang@...ux.intel.com> wrote:
>
> Hi Alexei,
>
> On 2025-06-01 8:30 p.m., Alexei Starovoitov wrote:
> > On Tue, May 20, 2025 at 11:16:29AM -0700, kan.liang@...ux.intel.com wrote:
> >> From: Kan Liang <kan.liang@...ux.intel.com>
> >>
> >> The current throttle logic doesn't work well with a group, e.g., the
> >> following sampling-read case.
> >>
> >> $ perf record -e "{cycles,cycles}:S" ...
> >>
> >> $ perf report -D | grep THROTTLE | tail -2
> >>             THROTTLE events:        426  ( 9.0%)
> >>           UNTHROTTLE events:        425  ( 9.0%)
> >>
> >> $ perf report -D | grep PERF_RECORD_SAMPLE -a4 | tail -n 5
> >> 0 1020120874009167 0x74970 [0x68]: PERF_RECORD_SAMPLE(IP, 0x1):
> >> ... sample_read:
> >> .... group nr 2
> >> ..... id 0000000000000327, value 000000000cbb993a, lost 0
> >> ..... id 0000000000000328, value 00000002211c26df, lost 0
> >>
> >> The second cycles event has a much larger value than the first cycles
> >> event in the same group.
> >>
> >> The current throttle logic in the generic code only logs the THROTTLE
> >> event. It relies on the specific driver implementation to disable
> >> events. For all ARCHs, the implementation is similar. Only the event is
> >> disabled, rather than the group.
> >>
> >> The logic to disable the group should be generic for all ARCHs. Add the
> >> logic in the generic code. The following patch will remove the buggy
> >> driver-specific implementation.
> >>
> >> The throttle only happens when an event is overflowed. Stop the entire
> >> group when any event in the group triggers the throttle.
> >> The MAX_INTERRUPTS is set to all throttle events.
> >>
> >> The unthrottled could happen in 3 places.
> >> - event/group sched. All events in the group are scheduled one by one.
> >>   All of them will be unthrottled eventually. Nothing needs to be
> >>   changed.
> >> - The perf_adjust_freq_unthr_events for each tick. Needs to restart the
> >>   group altogether.
> >> - The __perf_event_period(). The whole group needs to be restarted
> >>   altogether as well.
> >>
> >> With the fix,
> >> $ sudo perf report -D | grep PERF_RECORD_SAMPLE -a4 | tail -n 5
> >> 0 3573470770332 0x12f5f8 [0x70]: PERF_RECORD_SAMPLE(IP, 0x2):
> >> ... sample_read:
> >> .... group nr 2
> >> ..... id 0000000000000a28, value 00000004fd3dfd8f, lost 0
> >> ..... id 0000000000000a29, value 00000004fd3dfd8f, lost 0
> >>
> >> Suggested-by: Peter Zijlstra (Intel) <peterz@...radead.org>
> >> Signed-off-by: Kan Liang <kan.liang@...ux.intel.com>
> >> ---
> >>  kernel/events/core.c | 66 ++++++++++++++++++++++++++++++--------------
> >>  1 file changed, 46 insertions(+), 20 deletions(-)
> >
> > This patch breaks perf hw events somehow.
> >
> > After merging this into bpf trees we see random "watchdog: BUG: soft lockup"
> > with various stack traces followed up:
> > [   78.620749] Sending NMI from CPU 8 to CPUs 0:
> > [   76.387722] NMI backtrace for cpu 0
> > [   76.387722] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Tainted: G           O L      6.15.0-10818-ge0f0ee1c31de #1163 PREEMPT
> > [   76.387722] Tainted: [O]=OOT_MODULE, [L]=SOFTLOCKUP
> > [   76.387722] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
> > [   76.387722] RIP: 0010:_raw_spin_lock_irqsave+0xc/0x40
> > [   76.387722] Call Trace:
> > [   76.387722]  <IRQ>
> > [   76.387722]  hrtimer_try_to_cancel.part.0+0x24/0xe0
> > [   76.387722]  hrtimer_cancel+0x21/0x40
> > [   76.387722]  cpu_clock_event_stop+0x64/0x70
>
>
> The issues should be fixed by the patch.
> https://lore.kernel.org/lkml/20250528175832.2999139-1-kan.liang@linux.intel.com/
>
> Could you please give it a try?

Thanks. It fixes it, but the commit log says that
only cpu-clock and task_clock are affected,
which are SW events.

While our tests are locking while setting up:

        struct perf_event_attr attr = {
                .freq = 1,
                .type = PERF_TYPE_HARDWARE,
                .config = PERF_COUNT_HW_CPU_CYCLES,
        };

Is it because we run in x86 VM and HW_CPU_CYCLES is mapped
to cpu-clock sw ?