lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7B6B74DF-53E0-42DA-97D6-3F04E84D688B@fb.com>
Date:   Fri, 18 Oct 2019 06:13:21 +0000
From:   Song Liu <songliubraving@...com>
To:     Stephane Eranian <eranian@...gle.com>
CC:     open list <linux-kernel@...r.kernel.org>,
        "peterz@...radead.org" <peterz@...radead.org>,
        "mingo@...e.hu" <mingo@...e.hu>,
        "acme@...hat.com" <acme@...hat.com>,
        "jolsa@...hat.com" <jolsa@...hat.com>,
        "kan.liang@...el.com" <kan.liang@...el.com>,
        "irogers@...gle.com" <irogers@...gle.com>
Subject: Re: [PATCH] perf/core: fix multiplexing event scheduling issue



> On Oct 17, 2019, at 5:27 PM, Stephane Eranian <eranian@...gle.com> wrote:
> 
> This patch complements the following commit:
> 7fa343b7fdc4 ("perf/core: Fix corner case in perf_rotate_context()")
> 
> The fix from Song addresses the consequences of the problem but
> not the cause. This patch fixes the causes and can sit on top of
> Song's patch.
> 
> This patch fixes a scheduling problem in the core functions of
> perf_events. Under certain conditions, some events would not be
> scheduled even though many counters would be available. This
> is related to multiplexing and is architecture agnostic and
> PMU agnostic (i.e., core or uncore).
> 
> This problem can easily be reproduced when you have two perf
> stat sessions. The first session does not cause multiplexing,
> let's say it is measuring 1 event, E1. While it is measuring,
> a second session starts and causes multiplexing. Let's say it
> adds 6 events, B1-B6. Now, 7 events compete and are multiplexed.
> When the second session terminates, all 6 (B1-B6) events are
> removed. Normally, you'd expect the E1 event to continue to run
> with no multiplexing. However, the problem is that depending on
> the state Of E1 when B1-B6 are removed, it may never be scheduled
> again. If E1 was inactive at the time of removal, despite the
> multiplexing hrtimer still firing, it will not find any active
> events and will not try to reschedule. This is what Song's patch
> fixes. It forces the multiplexing code to consider non-active events.

Good analysis! I kinda knew the example I had (with pinned event)
was not the only way to trigger this issue. But I didn't think 
about event remove path. 

> However, the cause is not addressed. The kernel should not rely on
> the multiplexing hrtimer to unblock inactive events. That timer
> can have abitrary duration in the milliseconds. Until the timer
> fires, counters are available, but no measurable events are using
> them. We do not want to introduce blind spots of arbitrary durations.
> 
> This patch addresses the cause of the problem, by checking that,
> when an event is disabled or removed and the context was multiplexing
> events, inactive events gets immediately a chance to be scheduled by
> calling ctx_resched(). The rescheduling is done on  event of equal
> or lower priority types.  With that in place, as soon as a counter
> is freed, schedulable inactive events may run, thereby eliminating
> a blind spot.
> 
> This can be illustrated as follows using Skylake uncore CHA here:
> 
> $ perf stat --no-merge -a -I 1000 -C 28 -e uncore_cha_0/event=0x0/
>    42.007856531      2,000,291,322      uncore_cha_0/event=0x0/
>    43.008062166      2,000,399,526      uncore_cha_0/event=0x0/
>    44.008293244      2,000,473,720      uncore_cha_0/event=0x0/
>    45.008501847      2,000,423,420      uncore_cha_0/event=0x0/
>    46.008706558      2,000,411,132      uncore_cha_0/event=0x0/
>    47.008928543      2,000,441,660      uncore_cha_0/event=0x0/
> 
> Adding second sessiont with 4 events for 4s
> 
> $ perf stat -a -I 1000 -C 28 --no-merge -e uncore_cha_0/event=0x0/,uncore_cha_0/event=0x0/,uncore_cha_0/event=0x0/,uncore_cha_0/event=0x0/ sleep 4
>    48.009114643      1,983,129,830      uncore_cha_0/event=0x0/                                       (99.71%)
>    49.009279616      1,976,067,751      uncore_cha_0/event=0x0/                                       (99.30%)
>    50.009428660      1,974,448,006      uncore_cha_0/event=0x0/                                       (98.92%)
>    51.009524309      1,973,083,237      uncore_cha_0/event=0x0/                                       (98.55%)
>    52.009673467      1,972,097,678      uncore_cha_0/event=0x0/                                       (97.11%)
> 
> End of 4s, second session is removed. But the first event does not schedule and never will
> unless new events force multiplexing again.
> 
>    53.009815999      <not counted>      uncore_cha_0/event=0x0/                                       (95.28%)
>    54.009961809      <not counted>      uncore_cha_0/event=0x0/                                       (93.52%)
>    55.010110972      <not counted>      uncore_cha_0/event=0x0/                                       (91.82%)
>    56.010217579      <not counted>      uncore_cha_0/event=0x0/                                       (90.18%)

Does this still happen after my patch? I was not able to repro this. 

> Signed-off-by: Stephane Eranian <eranian@...gle.com>

Maybe add:
Fixes: 8d5bce0c37fa ("perf/core: Optimize perf_rotate_context() event scheduling")

Thanks,
Song

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ