linux-kernel - Re: [PATCH v1] perf tests: Run tests in parallel by default

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAP-5=fWk6hMWMy93QgRAWUowRk5LNEkqRVa3u4h0fPRXyE-y+Q@mail.gmail.com>
Date: Fri, 26 Apr 2024 09:45:42 -0700
From: Ian Rogers <irogers@...gle.com>
To: Adrian Hunter <adrian.hunter@...el.com>
Cc: James Clark <james.clark@....com>, Peter Zijlstra <peterz@...radead.org>, 
	Ingo Molnar <mingo@...hat.com>, Arnaldo Carvalho de Melo <acme@...nel.org>, Namhyung Kim <namhyung@...nel.org>, 
	Mark Rutland <mark.rutland@....com>, 
	Alexander Shishkin <alexander.shishkin@...ux.intel.com>, Jiri Olsa <jolsa@...nel.org>, 
	Kan Liang <kan.liang@...ux.intel.com>, linux-perf-users@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v1] perf tests: Run tests in parallel by default

On Fri, Apr 26, 2024 at 8:42 AM Adrian Hunter <adrian.hunter@...el.com> wrote:
>
> On 26/04/24 18:06, James Clark wrote:
> >
> >
> > On 01/03/2024 17:47, Ian Rogers wrote:
> >> Switch from running tests sequentially to running in parallel by
> >> default. Change the opt-in '-p' or '--parallel' flag to '-S' or
> >> '--sequential'.
> >>
> >> On an 8 core tigerlake an address sanitizer run time changes from:
> >> 326.54user 622.73system 6:59.91elapsed 226%CPU
> >> to:
> >> 973.02user 583.98system 3:01.17elapsed 859%CPU
> >>
> >> So over twice as fast, saving 4 minutes.
> >>
> >
> > Apologies for not replying earlier before this was applied. But IMO this
> > isn't a good default. Tests that use things like exclusive PMUs
> > (Coresight for example) can never pass when run in parallel.
>
> Yes, that is an issue for Intel PT also.

Right, Arnaldo and I discussed this and the safe thing to do for now
is to keep the default at serial - ie revert this change.

My longer term concern is that basically means people won't use
parallel testing and we'll bitrot in places like synthesis where it's
easy to make naive assumptions that say /proc won't be changing.

We have these tests set up for continuous testing in my team, and as
part of that set up each test is run independently to maximize
parallelism. This means the exclusive PMU thing does make the tests
flaky so it would be nice to have the tests address this, either by
skipping or busy waiting (only when the error code from the PMU is
busy, up to some maximum time period).

> >
> > For CI it's arguable whether you'd want to trade stability for speed.
> > And for interactive sessions there was already the --parallel option
> > which was easy to add and have it in your bash history.
> >
> > Now we've changed the default, any CI will need to be updated to add
> > --sequential if it wants all the tests to pass.
>
> Same here.
>
> >                                                 Maybe we could do some
> > hack and gate it on interactive vs non interactive sessions, but that
> > might be getting too clever. (Or a "don't run in parallel" flag on
> > certain tests)
>
> Perhaps more attention is needed for the tests that take so long.
> For example, maybe they could do things in parallel.

This is true for the tool in general. The problem is a lack of good
abstractions. Probably the easiest candidate for this are the shell
tests.

> Also -F option doesn't seem to work.
>
> $ tools/perf/perf test -F
> Couldn't find a vmlinux that matches the kernel running on this machine, skipping test
>   1: vmlinux symtab matches kallsyms                                 : Skip
>   2: Detect openat syscall event                                     : Ok
>   3: Detect openat syscall event on all cpus                         : Ok
>   4.1: Read samples using the mmap interface                         : Ok
>   4.2: User space counter reading of instructions                    : Ok
>   4.3: User space counter reading of cycles                          : Ok
>   5: Test data source output                                         : Ok
> WARNING: event 'numpmu' not valid (bits 16-17,20,22 of config '6530160' not supported by kernel)!
>   6.1: Test event parsing                                            : Ok
>   6.2: Parsing of all PMU events from sysfs                          : Ok
> WARNING: event 'N/A' not valid (bits 0-1 of config2 '3' not supported by kernel)!
>   6.3: Parsing of given PMU events from sysfs                        : Ok
>   6.4: Parsing of aliased events from sysfs                          : Skip (no aliases in sysfs)
>   6.5: Parsing of aliased events                                     : Ok
>   6.6: Parsing of terms (event modifiers)                            : Ok
>   7: Simple expression parser                                        : Ok
>   8: PERF_RECORD_* events & perf_sample fields                       : Ok
>   9: Parse perf pmu format                                           : Ok
>  10.1: PMU event table sanity                                        : Ok
>  10.2: PMU event map aliases                                         : Ok
> failed: recursion detected for M1
> failed: recursion detected for M2
> failed: recursion detected for M3
> Not grouping metric tma_memory_fence's events.
> Try disabling the NMI watchdog to comply NO_NMI_WATCHDOG metric constraint:
>     echo 0 > /proc/sys/kernel/nmi_watchdog
>     perf stat ...
>     echo 1 > /proc/sys/kernel/nmi_watchdog
> Try disabling the NMI watchdog to comply NO_NMI_WATCHDOG metric constraint:
>     echo 0 > /proc/sys/kernel/nmi_watchdog
>     perf stat ...
>     echo 1 > /proc/sys/kernel/nmi_watchdog
> ...
> ^C

Not working meaning the errors are going to stderr? The issue is that
pr_err will write to stderr regardless of the verbosity setting and
some tests deliberately induce errors. We could use debug_set_file
 to set the output to /dev/null, but that could hide trouble. -F
usually means your debugging so we've not cared in the past.

Thanks,
Ian