lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CABPqkBQzCMNS_PfLZBWVuX9o8Z55PovwJvpVWMWzyeExFJ5R4Q@mail.gmail.com>
Date: Fri, 28 Mar 2025 13:05:23 -0700
From: Stephane Eranian <eranian@...gle.com>
To: Chun-Tse Shao <ctshao@...gle.com>
Cc: tmricht@...ux.ibm.com, acme@...nel.org, agordeev@...ux.ibm.com, 
	gor@...ux.ibm.com, hca@...ux.ibm.com, irogers@...gle.com, 
	linux-kernel@...r.kernel.org, linux-perf-users@...r.kernel.org, 
	linux-s390@...r.kernel.org, namhyung@...nel.org, sumanthk@...ux.ibm.com, 
	Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH] perf/test: Skip leader sampling for s390

Hi,

Thanks CT for the post. Indeed this is a long-standing bug impacting
(most likely)
all architectures. The rate throttling code does not consider event grouping. It
stops the sampling event in place (on x86) at the hardware level, not
the generic
scheduling layer. But if the event is in a group, it may make sense to also stop
all the other events in the group, i.e., stop the group. Otherwise you may get
discrepancies between samples of the "slave events". Similarly, the time_running
and time_enable logic is not modified during throttling.
Interested in hearing potential ways of solving this in a portable manner.

On Fri, Mar 28, 2025 at 11:27 AM Chun-Tse Shao <ctshao@...gle.com> wrote:
>
> We believe we know the problem, appreciate Stephan Eranian's investigation.
> It comes from throttling. While the sampling is too high, the generic code
> does not modify event scheduling. `perf_event_overflow()` simply returns 1,
> and subsequently, `pmu_stop()` only stops the leader event, not the slave
> events because the arch layer does not consider groups. Also, the
> `event_stop()` callback only operates on a single event, not the siblings.
>
> This would impact all architectures. Perhaps we can extend the
> `evnet_stop()` callback to include a new argument to also stop the siblings.
> We also welcome all suggestions and open to discuss any potential solutions.
>
> Thanks,
> CT
>
> Cc: Stephane Eranian <eranian@...gle.com>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ