lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Tue, 26 Mar 2024 11:50:12 +0530
From: Vishal Chourasia <vishalc@...ux.ibm.com>
To: John Stultz <jstultz@...gle.com>
Cc: LKML <linux-kernel@...r.kernel.org>, Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
        Mel Gorman <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Phil Auld <pauld@...hat.com>, Valentin Schneider <vschneid@...hat.com>,
        kernel-team@...roid.com, Zimuzo Ezeozue <zezeozue@...gle.com>
Subject: Re: [RESEND][PATCH v2] sched: Add trace_sched_waking() tracepoint to
 sched_ttwu_pending()

On Wed, Mar 20, 2024 at 03:42:31PM -0700, John Stultz wrote:
> On Tue, Mar 19, 2024 at 10:24 PM Vishal Chourasia <vishalc@...ux.ibm.com> wrote:
> > On Mon, Mar 18, 2024 at 12:28:33PM -0700, John Stultz wrote:
> > > Zimuzo reported seeing occasional cases in perfetto traces where
> > > tasks went from sleeping directly to trace_sched_wakeup()
> > > without always seeing a trace_sched_waking().
> > >
> > > Looking at the code, trace_sched_wakeup() is only called in
> > > ttwu_do_wakeup()
> > >
> > > The call paths that get you to ttwu_do_wakeup() are:
> > > try_to_wake_up() -> ttwu_do_wakeup()
> > > try_to_wake_up() -> ttwu_runnable() -> ttwu_do_wakeup()
> > > try_to_wake_up() -> ttwu_queue() -> ttwu_do_activate() -> ttwu_do_wakeup()
> > > sched_ttwu_pending() -> ttwu_do_activate() -> ttwu_do_wakeup()
> > Notably, sched_ttwu_pending() is invoked for remote wakeups.
> >
> > Given this, I anticipate a scenario similar to the following
> > occurred: When a process (P) is to be awakened on a remote CPU,
> > the scheduler adds P to the remote CPU's wakelist,a per-CPU queue,
> > and sends an IPI to the remote CPU. This action triggers
> > sched_ttwu_pending() on the remote CPU, which then processes the
> > wakelist and wakes up the queued processes.
> 
> Hey! Thanks for taking a look here and sharing this feedback.
> 
> Indeed, looking closer here I do have a hard time seeing how we can
> get to sched_ttwu_pending() without having gone through the paths in
> try_to_wake_up() that would call trace_sched_wakeup():
>   try_to_wake_up()->[ttwu_queue ->]
> ttwu_queue_wakelist()->sched_ttwu_pending()->...
> 
> > In this scenario, the "waking trace" of P, signifying the initiation
> > of the wake-up, is recorded on the CPU where try_to_wake_up was executed.
> > Meanwhile, the "wakeup trace," denoting the completion of the wake-up,
> > is observed on the remote CPU where sched_ttwu_pending() is executed.
> >
> > Is there a possibility that something other than the above occurred
> > in your case?
> 
> I suspect that the case reported may be that the task went to sleep
> right after the trace_sched_waking(). Which could result in the
> transition from sleeping to trace_sched_wakeup() without the
> trace_sched_waking() inbetween.  The added latency from the remote
> wakeup probably increases the chance for this race to occur.

Hi,

# trace-cmd record -e sched:* perf record -a -g -e sched:* ./schbench
This generates perf.data and trace.dat file.

Running a simple check for equality between sched_waking and
sched_wakeup. I see two different results.

# perf script -G | grep -i "sched_waking" | wc -l
35264
# perf script -G | grep -i "sched_wakeup:" | wc -l
35314

perf report suggests missing sched_waking events. Although trace
collected through ftrace (using trace-cmd) suggests equal waking and
wakeup events.

# trace-cmd report | grep -i "sched_waking" | wc -l
35959
# trace-cmd report | grep -i "sched_wakeup:" | wc -l
35959

Below is a stack trace for a sched_wakeup event for which sched_waking
was not recorded in perf.data file,

schbench  307346 [023] 600177.250150:    
         sched:sched_wakeup: perf:307321 [120] CPU:023
         ttwu_do_activate+0x1c0 ([kernel.kallsyms])
         print_fmt_xfs_perag_intents_class+0x0 ([xfs].data)
         sched_ttwu_pending+0x118 ([kernel.kallsyms])
=>       __flush_smp_call_function_queue+0x19c ([kernel.kallsyms])
         smp_ipi_demux_relaxed+0xec ([kernel.kallsyms])
         xive_muxed_ipi_action+0x20 ([kernel.kallsyms])
         __handle_irq_event_percpu+0x8c ([kernel.kallsyms])
         handle_irq_event_percpu+0x2c ([kernel.kallsyms])
         handle_percpu_irq+0x80 ([kernel.kallsyms])
         generic_handle_irq+0x50 ([kernel.kallsyms])
         __do_irq+0xb8 ([kernel.kallsyms])
         __do_IRQ+0x88 ([kernel.kallsyms])
         print_fmt_xfs_perag_intents_class+0x0 ([xfs].data)
         do_IRQ+0x50 ([kernel.kallsyms])
         hardware_interrupt_common_virt+0x28c ([kernel.kallsyms])
         worker_thread+0x268 (/usr/bin/schbench)
         worker_thread+0x4cc (/usr/bin/schbench)
         start_thread+0x170 (/usr/lib64/glibc-hwcaps/power10/libc.so.6)
         __clone+0xa0 (/usr/lib64/glibc-hwcaps/power10/libc.so.6)

Also, it has been observed for cases where sched_waking is missing only
happens when perf task is woken up.

Looking at the code in __flush_smp_call_function_queue,
it invokes all the callbacks currently queue on the given cpu, and if
sched_ttwu_pending is being called, suggesting that csd of type CSD_TYPE_TTWU
was queued. Now, this happens inside try_to_wake_up.

if there are no other way to queue a csd of type CSD_TYPE_TTWU, then
this suggests some events are being missed by perf.

> 
> So I'll withdraw this change for now and sync back up to check if my
> suspicions are correct or not.
> 
> Thanks so much again for pointing this out.
> -john

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ