lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAP-5=fUvA5Lmk16Zs1B6bj1OLHf9NaHhAzx7yZL9S7JGwmO0uw@mail.gmail.com>
Date: Fri, 19 Jul 2024 08:01:52 -0700
From: Ian Rogers <irogers@...gle.com>
To: "Liang, Kan" <kan.liang@...ux.intel.com>
Cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>, 
	Arnaldo Carvalho de Melo <acme@...nel.org>, Namhyung Kim <namhyung@...nel.org>, 
	Mark Rutland <mark.rutland@....com>, 
	Alexander Shishkin <alexander.shishkin@...ux.intel.com>, Jiri Olsa <jolsa@...nel.org>, 
	Adrian Hunter <adrian.hunter@...el.com>, Bjorn Helgaas <bhelgaas@...gle.com>, 
	Jonathan Corbet <corbet@....net>, James Clark <james.clark@....com>, 
	Ravi Bangoria <ravi.bangoria@....com>, Dominique Martinet <asmadeus@...ewreck.org>, 
	linux-kernel@...r.kernel.org, linux-perf-users@...r.kernel.org, 
	Dhananjay Ugwekar <Dhananjay.Ugwekar@....com>, ananth.narayan@....com, gautham.shenoy@....com, 
	kprateek.nayak@....com, sandipan.das@....com
Subject: Re: [PATCH v2 6/6] perf parse-events: Add "cpu" term to set the CPU
 an event is recorded on

On Fri, Jul 19, 2024 at 7:14 AM Liang, Kan <kan.liang@...ux.intel.com> wrote:
>
>
>
> On 2024-07-18 5:06 p.m., Ian Rogers wrote:
> > On Thu, Jul 18, 2024 at 11:03 AM Liang, Kan <kan.liang@...ux.intel.com> wrote:
> >>
> >>
> >>
> >> On 2024-07-18 11:12 a.m., Ian Rogers wrote:
> >>> On Thu, Jul 18, 2024 at 7:41 AM Liang, Kan <kan.liang@...ux.intel.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 2024-07-17 8:30 p.m., Ian Rogers wrote:
> >>>>> The -C option allows the CPUs for a list of events to be specified but
> >>>>> its not possible to set the CPU for a single event. Add a term to
> >>>>> allow this. The term isn't a general CPU list due to ',' already being
> >>>>> a special character in event parsing instead multiple cpu= terms may
> >>>>> be provided and they will be merged/unioned together.
> >>>>>
> >>>>> An example of mixing different types of events counted on different CPUs:
> >>>>> ```
> >>>>> $ perf stat -A -C 0,4-5,8 -e "instructions/cpu=0/,l1d-misses/cpu=4,cpu=5/,inst_retired.any/cpu=8/,cycles" -a sleep 0.1
> >>>>>
> >>>>>  Performance counter stats for 'system wide':
> >>>>>
> >>>>> CPU0              368,647      instructions/cpu=0/              #    0.26  insn per cycle
> >>>>> CPU4        <not counted>      instructions/cpu=0/
> >>>>> CPU5        <not counted>      instructions/cpu=0/
> >>>>> CPU8        <not counted>      instructions/cpu=0/
> >>>>> CPU0        <not counted>      l1d-misses [cpu]
> >>>>> CPU4              203,377      l1d-misses [cpu]
> >>>>> CPU5              138,231      l1d-misses [cpu]
> >>>>> CPU8        <not counted>      l1d-misses [cpu]
> >>>>> CPU0        <not counted>      cpu/cpu=8/
> >>>>> CPU4        <not counted>      cpu/cpu=8/
> >>>>> CPU5        <not counted>      cpu/cpu=8/
> >>>>> CPU8              943,861      cpu/cpu=8/
> >>>>> CPU0            1,412,071      cycles
> >>>>> CPU4           20,362,900      cycles
> >>>>> CPU5           10,172,725      cycles
> >>>>> CPU8            2,406,081      cycles
> >>>>>
> >>>>>        0.102925309 seconds time elapsed
> >>>>> ```
> >>>>>
> >>>>> Note, the event name of inst_retired.any is missing, reported as
> >>>>> cpu/cpu=8/, as there are unmerged uniquify fixes:
> >>>>> https://lore.kernel.org/lkml/20240510053705.2462258-3-irogers@google.com/
> >>>>>
> >>>>> An example of spreading uncore overhead across two CPUs:
> >>>>> ```
> >>>>> $ perf stat -A -e "data_read/cpu=0/,data_write/cpu=1/" -a sleep 0.1
> >>>>>
> >>>>>  Performance counter stats for 'system wide':
> >>>>>
> >>>>> CPU0               223.65 MiB  uncore_imc_free_running_0/cpu=0/
> >>>>> CPU0               223.66 MiB  uncore_imc_free_running_1/cpu=0/
> >>>>> CPU0        <not counted> MiB  uncore_imc_free_running_0/cpu=1/
> >>>>> CPU1                 5.78 MiB  uncore_imc_free_running_0/cpu=1/
> >>>>> CPU0        <not counted> MiB  uncore_imc_free_running_1/cpu=1/
> >>>>> CPU1                 5.74 MiB  uncore_imc_free_running_1/cpu=1/
> >>>>> ```
> >>>>>
> >>>>> Manually fixing the output it should be:
> >>>>> ```
> >>>>> CPU0               223.65 MiB  uncore_imc_free_running_0/data_read,cpu=0/
> >>>>> CPU0               223.66 MiB  uncore_imc_free_running_1/data_read,cpu=0/
> >>>>> CPU1                 5.78 MiB  uncore_imc_free_running_0/data_write,cpu=1/
> >>>>> CPU1                 5.74 MiB  uncore_imc_free_running_1/data_write,cpu=1/
> >>>>> ```
> >>>>>
> >>>>> That is data_read from 2 PMUs was counted on CPU0 and data_write was
> >>>>> counted on CPU1.
> >>>>
> >>>> There was an effort to make the counter access from any CPU of the package.
> >>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d6a2f9035bfc27d0e9d78b13635dda9fb017ac01
> >>>>
> >>>> But now it limits the access from specific CPUs. It sounds like a
> >>>> regression.
> >>>
> >>> Thanks Kan, I'm not sure I understand the comment.
> >>
> >> The flag is also applied for the uncore and RAPL.
> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/x86/events/intel/uncore.c?&id=e64cd6f73ff5a7eb4f8f759049ee24a3fe55e731
> >>
> >> So specifying a CPU to an uncore event doesn't make sense. If the
> >> current CPU is in the same package as the asked CPU. The kernel will
> >> always choose the current CPU.
> >
> > Ugh, that sounds sub-optimal. If I'm monitoring uncore events with
> > cgroups CPU0 (or the first CPU in a package) is going to be loaded up
> > with all the events and have all of the rdmsr/wrmsrs in its context
> > switch. Perhaps we should warn and say to use BPF events.
> >
> > Is there a way through say ioctls to get the CPU an event is on? That
> > way we could update the `perf stat -A` to accurately report cpus.
> > There's also the issue that the affinity stuff is going to be off.
> >
>
> I don't think there is such ioctl.
>
> Emphasizing the CPU ID for an uncore event seems misleading. The uncore
> only supports per-socket counter, not per-core counter.
> Opening/reading an counter from any CPUs on a package should be identical.
> An accurate report of the `perf stat -A` for an uncore should use "S0".

I think it is an "elite" level trick to try to do things like balance
context switch overhead. Putting everything on the first CPU feels
like a scalability issue for cgroup events. BPF events work around it
to some degree, but they are still going to put all the work on CPU0
which could cause performance issues, latency spikes, etc. on it.

Thanks,
Ian

> Thanks,
> Kan
>
> > Thanks,
> > Ian
> >
> >
> >> Thanks,
> >> Kan
> >>> The overhead I was
> >>> thinking of here is more along the lines of cgroup context switches
> >>> (although that isn't in my example). There may be a large number of
> >>> say memory controller events just by having 2 events for each PMU and
> >>> then there are 10s of PMUs. By putting half of the events on 1 CPU and
> >>> half on another, the context switch overhead is shared. That said, the
> >>> counters don't care what cgroup is accessing memory, and users doing
> >>> this are likely making some kind of error.
> >>>
> >>> Thanks,
> >>> Ian
> >>>
> >

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ