lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e8dfc3df-69d6-46ba-81db-4a04484fb7c3@amd.com>
Date: Wed, 19 Feb 2025 18:53:46 +0530
From: Ravi Bangoria <ravi.bangoria@....com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: "mingo@...nel.org" <mingo@...nel.org>,
 "lucas.demarchi@...el.com" <lucas.demarchi@...el.com>,
 "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
 "willy@...radead.org" <willy@...radead.org>,
 "acme@...nel.org" <acme@...nel.org>,
 "namhyung@...nel.org" <namhyung@...nel.org>,
 "mark.rutland@....com" <mark.rutland@....com>,
 "alexander.shishkin@...ux.intel.com" <alexander.shishkin@...ux.intel.com>,
 "jolsa@...nel.org" <jolsa@...nel.org>,
 "irogers@...gle.com" <irogers@...gle.com>,
 "adrian.hunter@...el.com" <adrian.hunter@...el.com>,
 "kan.liang@...ux.intel.com" <kan.liang@...ux.intel.com>,
 Ravi Bangoria <ravi.bangoria@....com>
Subject: Re: [PATCH v2 24/24] perf: Make perf_pmu_unregister() useable

Hi Peter,

>>>> Apparently not, it ends up with:
>>>>
>>>>   ------------[ cut here ]------------
>>>>   WARNING: CPU: 145 PID: 5459 at kernel/events/core.c:281 event_function+0xd2/0xf0
>>>>   WARNING: CPU: 145 PID: 5459 at kernel/events/core.c:286 event_function+0xd6/0xf0
>>>
>>>>    remote_function+0x4f/0x70
>>>>    generic_exec_single+0x7f/0x160
>>>>    smp_call_function_single+0x110/0x160
>>>>    event_function_call+0x98/0x1d0
>>>>    _perf_event_disable+0x41/0x70
>>>>    perf_event_for_each_child+0x40/0x90
>>>>    _perf_ioctl+0xac/0xb00
>>>>    perf_ioctl+0x45/0x80
>>>
>>> Took me a long while trying to blame this on the 'event->parent =
>>> NULL;', but AFAICT this is a new, unrelated issue.
>>>
>>> What I think happens is this perf_ioctl(DISABLE) vs pmu_detach_events()
>>> race, where the crux is that perf_ioctl() path does not take
>>> event2->mutex which allows the following interleave:
>>
>> This one was only with perf_fuzzer, so pmu_detach_events() code path was
>> not invoked.
> 
> I think the issue is, unaccount_event() gets called for the child event
> after the child is detached. Since event->parent is NULL, unaccount_event()
> abruptly corrupts the perf_sched_work.

A different manifestation of the same underlying issue (perf_sched_work
is getting corrupted because of event->parent = NULL):

  WARNING: CPU: 0 PID: 5799 at kernel/events/core.c:286 event_function+0xd6/0xf0
  CPU: 0 UID: 1002 PID: 5799 Comm: perf_fuzzer Kdump: loaded Not tainted 6.14.0-rc1-pmu-unregister+ #263
  RIP: 0010:event_function+0xd6/0xf0
  RSP: 0018:ffa0000006a87828 EFLAGS: 00010086
  RAX: 0000000000000007 RBX: ff11001008830ee0 RCX: 0000000000000000
  RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
  RBP: ffa0000006a87850 R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000000 R12: ffa0000006a87918
  R13: 0000000000000000 R14: ff1100010e347c00 R15: ff110001226c9a40
  FS:  0000000000000000(0000) GS:ff11001008800000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000595fed7d3e12 CR3: 000000016954c005 CR4: 0000000000f71ef0
  PKRU: 55555554
  Call Trace:
   <TASK>
   ? event_function+0xd6/0xf0
   ? event_function+0x3d/0xf0
   remote_function+0x4c/0x70
   generic_exec_single+0x7c/0x160
   smp_call_function_single+0x110/0x160
   event_function_call+0x98/0x1d0
   perf_remove_from_context+0x46/0xa0
   perf_event_release_kernel+0x1f3/0x210
   perf_release+0x12/0x20

With the above trace as reference, something like below is what is
happening. (Code paths are based on my understanding of traces, I've
not verified each an every bit :P)

  Assume:
  task1: event1
  task2 (child of task1): event2 (child of event1)
  task3: event3

              CPU 1                                             CPU 2                                                               CPU 3                                                       CPU 4
  
                                                  /* task3 sched in */
                                                  perf_event_task_sched_in()
                                                    if (static_branch_unlikely(&perf_sched_events)) /* true */
                                                        __perf_event_task_sched_in()
                                                            ...

  task2:
  perf_event_exit_event() /* event2 */
    event->parent = NULL /* by perf_perf_child_detach() */
    free_event()
      _free_event()
        unaccount_event()
          if (event->parent) /* false */
              return; /* won't get executed */
          perf_sched_work <== OFF  ------.
                                         |
                                         |        /* task3 sched out */
                                         |        perf_event_task_sched_out()
                                         '----->   if (static_branch_unlikely(&perf_sched_events)) /* false */
                                         |             /* __perf_event_task_sched_out() won't get called */
                                         |
                                         |                                                                            /* task3 sched in */
                                         |                                                                            perf_event_task_sched_in()
                                         '-------------------------------------------------------------------------->   if (static_branch_unlikely(&perf_sched_events)) /* false */
                                                                                                                             /* __perf_event_task_sched_in() won't get called
                                                                                                                             So cpuctx->task_ctx will remains NULL; */
  
                                                                                                                                                                                         close(event3)
                                                                                                                                                                                           perf_release()
                                                                                                                                                                                             perf_event_release_kernel()
                                                                                                                                                                                               perf_remove_from_context()
                                                                                                                                                                                                 event_function_call()
                                                                                                                                                                                                 /* IPI to CPU 3 ---> */
                                                                                                                      /* ---> IPI from CPU 4 */
                                                                                                                      event_function()
                                                                                                                        if (ctx->task) {
                                                                                                                            /* task_ctx is NULL whereas ctx is !NULL */
                                                                                                                            WARN_ON_ONCE(task_ctx != ctx);

                                                                                                                            ^^^^^^^^^^^^

IIUC, event->parent is the only way to detect whether the event is a
child or not, so it makes sense to preserve the ->parent pointer even
after the child is detached.

Thanks,
Ravi

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ