lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200521170020.GQ10078@localhost.localdomain>
Date:   Thu, 21 May 2020 19:00:20 +0200
From:   Juri Lelli <juri.lelli@...hat.com>
To:     Frederic Weisbecker <frederic@...nel.org>
Cc:     fweisbec@...il.com, tglx@...utronix.de, mingo@...nel.org,
        linux-rt-users@...r.kernel.org, peterz@...radead.org,
        linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH] tick/sched: update full_nohz status after SCHED dep
 is cleared

On 21/05/20 02:44, Frederic Weisbecker wrote:
> On Wed, May 20, 2020 at 08:47:10PM +0200, Juri Lelli wrote:
> > On 20/05/20 19:02, Frederic Weisbecker wrote:
> > > On Wed, May 20, 2020 at 06:49:25PM +0200, Juri Lelli wrote:
> > > > On 20/05/20 18:24, Frederic Weisbecker wrote:
> > > > 
> > > > Hummm, so I enabled 'timer:*', anything else you think I should be
> > > > looking at?
> > > 
> > > Are you sure you also enabled timer_expire_entry?
> > > Because:
> > 
> > I run with
> > 
> > trace-cmd record -e sched_switch -e sched_wakeup -e sched_migrate_task \
> > -e 'timer:*' -e 'irq_vectors:*' -e 'irq:softirq*' \
> > -e 'irq:irq_handler*' taskset --cpu-list 4-35 ./sysjitter/sysjitter \
> > --runtime 10 200
> > 
> > And
> > 
> > $ grep ' timer_expire_entry:' trace.txt | wc -l
> > 1173
> > 
> > seems to confirm that the event was indeed enabled, e.g.
> > 
> > ksoftirqd/11-100   [011]   159.270023: timer_expire_entry:   timer=0xffff9807df8a5a60 function=delayed_work_timer_fn now=4294826418 baseclk=4294825984
> 
> That's interesting! I'll see if I can reproduce that with threaded irqs. If not we'll
> have to investigate on your machine. We really don't want to trigger timer softirqs
> when it's not necessary.
> 

So, I managed to get a bit more info on this, please check the following
(which I edited a bit to hopefully make it clearer and apologies for
long lines, but I believe trimming them would mess things up)

---
ksoftirqd/3-26    [003]    99.942485: timer_expire_entry:   timer=0xffffffffa55a9d20 function=clocksource_watchdog now=4294759328 baseclk=4294759328
ksoftirqd/3-26    [003]    99.942489: timer_start:          timer=0xffffffffa55a9d20 function=clocksource_watchdog expires=4294759822 [timeout=494] cpu=4 idx=114 flags=D|P|I
ksoftirqd/3-26    [003]    99.942491: timer_expire_exit:    timer=0xffffffffa55a9d20
  sysjitter-2377  [004]    99.942491: call_function_single_entry: vector=251
  sysjitter-2377  [004]    99.942492: call_function_single_exit: vector=251
ksoftirqd/3-26    [003]    99.942493: softirq_exit:         vec=1 [action=TIMER]
ksoftirqd/3-26    [003]    99.942494: softirq_entry:        vec=7 [action=SCHED]
  sysjitter-2377  [004]    99.942494: hrtimer_start:        hrtimer=0xffff92e9df91fd20 function=tick_sched_timer/0x0 expires=92630457385 softexpires=92630457385
---

So, my understanding is that clock gets reactivated on CPU 4 to run
clocksource watchdog (indeed if I use tsc=nowatchdog this above doesn't
happen, but the potential issue is still there I guess).

---
  sysjitter-2377  [004]   100.438455: local_timer_entry:    vector=236
  sysjitter-2377  [004]   100.438456: hrtimer_cancel:       hrtimer=0xffff92e9df91fd20
  sysjitter-2377  [004]   100.438458: hrtimer_expire_entry: hrtimer=0xffff92e9df91fd20 now=92630461337 function=tick_sched_timer/0x0
  sysjitter-2377  [004]   100.438460: kernel_stack:         <stack trace>
=> __ftrace_trace_stack (ffffffffa37da921)
=> __raise_softirq_irqoff (ffffffffa36daf50)
=> raise_softirq (ffffffffa36db0fe)
=> update_process_times (ffffffffa3768f9d)
=> tick_sched_handle (ffffffffa377aaa2)
=> tick_sched_timer (ffffffffa377ad53)
=> __hrtimer_run_queues (ffffffffa3769cf0)
=> hrtimer_interrupt (ffffffffa376a58a)
=> smp_apic_timer_interrupt (ffffffffa40028f8)
=> apic_timer_interrupt (ffffffffa4001b7f)
  sysjitter-2377  [004]   100.438461: softirq_raise:        vec=1 [action=TIMER]
  sysjitter-2377  [004]   100.438464: kernel_stack:         <stack trace>
=> __ftrace_trace_stack (ffffffffa37da921)
=> __raise_softirq_irqoff (ffffffffa36daf50)
=> raise_softirq (ffffffffa36db0fe)
=> trigger_load_balance (ffffffffa371cb9c)
=> update_process_times (ffffffffa3768fc7)
=> tick_sched_handle (ffffffffa377aaa2)
=> tick_sched_timer (ffffffffa377ad53)
=> __hrtimer_run_queues (ffffffffa3769cf0)
=> hrtimer_interrupt (ffffffffa376a58a)
=> smp_apic_timer_interrupt (ffffffffa40028f8)
=> apic_timer_interrupt (ffffffffa4001b7f)
  sysjitter-2377  [004]   100.438465: softirq_raise:        vec=7 [action=SCHED]
---

Tick fires, the two softirq_raise (which you wondered about) seems to
come from

 1. update_process_times::run_local_timers (TIMER_SOFTIRQ)
 2. update_process_times::scheduler_tick::trigger_load_balance (SCHED_SOFTIRQ)

---
  sysjitter-2377  [004]   100.438468: hrtimer_expire_exit:  hrtimer=0xffff92e9df91fd20
  sysjitter-2377  [004]   100.438470: local_timer_exit:     vector=236
  sysjitter-2377  [004]   100.438473: sched_wakeup:         ksoftirqd/4:31 [120] success=1 CPU:004
  sysjitter-2377  [004]   100.438474: tick_stop:            success=0 dependency=SCHED
  sysjitter-2377  [004]   100.438476: hrtimer_start:        hrtimer=0xffff92e9df91fd20 function=tick_sched_timer/0x0 expires=92630512937 softexpires=92630512937
  sysjitter-2377  [004]   100.438484: irq_work_entry:       vector=246
  sysjitter-2377  [004]   100.438491: irq_work_exit:        vector=246
  sysjitter-2377  [004]   100.438492: tick_stop:            success=0 dependency=SCHED
  sysjitter-2377  [004]   100.438495: sched_switch:         sysjitter:2377 [120] R ==> ksoftirqd/4:31 [120]
ksoftirqd/4-31    [004]   100.438497: softirq_entry:        vec=1 [action=TIMER]
ksoftirqd/4-31    [004]   100.438499: timer_cancel:         timer=0xffffffffa55a9d20
ksoftirqd/4-31    [004]   100.438501: timer_expire_entry:   timer=0xffffffffa55a9d20 function=clocksource_watchdog now=4294759824 baseclk=4294759824
ksoftirqd/4-31    [004]   100.438504: timer_start:          timer=0xffffffffa55a9d20 function=clocksource_watchdog expires=4294760322 [timeout=498] cpu=5 idx=113 flags=D|P|I
---

Queueing clocksource watchdog on CPU5.

---
ksoftirqd/4-31    [004]   100.438506: timer_expire_exit:    timer=0xffffffffa55a9d20
ksoftirqd/4-31    [004]   100.438507: local_timer_entry:    vector=236
---

Back to back timer?

---
ksoftirqd/4-31    [004]   100.438509: hrtimer_cancel:       hrtimer=0xffff92e9df91fd20
ksoftirqd/4-31    [004]   100.438511: hrtimer_expire_entry: hrtimer=0xffff92e9df91fd20 now=92630515022 function=tick_sched_timer/0x0
ksoftirqd/4-31    [004]   100.438515: kernel_stack:         <stack trace>
=> __ftrace_trace_stack (ffffffffa37da921)
=> __raise_softirq_irqoff (ffffffffa36daf50)
=> raise_softirq (ffffffffa36db0fe)
=> rcu_sched_clock_irq (ffffffffa375af4a)
=> update_process_times (ffffffffa3768fa4)
=> tick_sched_handle (ffffffffa377aaa2)
=> tick_sched_timer (ffffffffa377ad53)
=> __hrtimer_run_queues (ffffffffa3769cf0)
=> hrtimer_interrupt (ffffffffa376a58a)
=> smp_apic_timer_interrupt (ffffffffa40028f8)
=> apic_timer_interrupt (ffffffffa4001b7f)
=> filter_pred_32 (ffffffffa37f3357)
=> filter_match_preds (ffffffffa37f3510)
=> trace_event_buffer_commit (ffffffffa37dc7eb)
=> trace_event_raw_event_softirq (ffffffffa36dab77)
=> __do_softirq (ffffffffa420025a)
=> run_ksoftirqd (ffffffffa36dadc6)
=> smpboot_thread_fn (ffffffffa36ffdb8)
=> kthread (ffffffffa36f9fb7)
=> ret_from_fork (ffffffffa4000215)
ksoftirqd/4-31    [004]   100.438516: softirq_raise:        vec=9 [action=RCU]
ksoftirqd/4-31    [004]   100.438520: kernel_stack:         <stack trace>
=> __ftrace_trace_stack (ffffffffa37da921)
=> __raise_softirq_irqoff (ffffffffa36daf50)
=> raise_softirq (ffffffffa36db0fe)
=> trigger_load_balance (ffffffffa371cb9c)
=> update_process_times (ffffffffa3768fc7)
=> tick_sched_handle (ffffffffa377aaa2)
=> tick_sched_timer (ffffffffa377ad53)
=> __hrtimer_run_queues (ffffffffa3769cf0)
=> hrtimer_interrupt (ffffffffa376a58a)
=> smp_apic_timer_interrupt (ffffffffa40028f8)
=> apic_timer_interrupt (ffffffffa4001b7f)
=> filter_pred_32 (ffffffffa37f3357)
=> filter_match_preds (ffffffffa37f3510)
=> trace_event_buffer_commit (ffffffffa37dc7eb)
=> trace_event_raw_event_softirq (ffffffffa36dab77)
=> __do_softirq (ffffffffa420025a)
=> run_ksoftirqd (ffffffffa36dadc6)
=> smpboot_thread_fn (ffffffffa36ffdb8)
=> kthread (ffffffffa36f9fb7)
=> ret_from_fork (ffffffffa4000215)
ksoftirqd/4-31    [004]   100.438520: softirq_raise:        vec=7 [action=SCHED]
ksoftirqd/4-31    [004]   100.438521: hrtimer_expire_exit:  hrtimer=0xffff92e9df91fd20
ksoftirqd/4-31    [004]   100.438523: hrtimer_start:        hrtimer=0xffff92e9df91fd20 function=tick_sched_timer/0x0 expires=92631512937 softexpires=92631512937
ksoftirqd/4-31    [004]   100.438525: local_timer_exit:     vector=236
ksoftirqd/4-31    [004]   100.438527: tick_stop:            success=0 dependency=SCHED
---

At this point (I think) ksoftirqd is already going to sleep, so only
sysjitter will be eventually executing on CPU4, but tick_stop check
still sees nr_running == 2.

---
ksoftirqd/4-31    [004]   100.438530: softirq_exit:         vec=1 [action=TIMER]
ksoftirqd/4-31    [004]   100.438531: softirq_entry:        vec=7 [action=SCHED]
ksoftirqd/4-31    [004]   100.438533: softirq_exit:         vec=7 [action=SCHED]
ksoftirqd/4-31    [004]   100.438534: softirq_entry:        vec=7 [action=SCHED]
ksoftirqd/4-31    [004]   100.438535: softirq_exit:         vec=7 [action=SCHED]
ksoftirqd/4-31    [004]   100.438536: softirq_entry:        vec=9 [action=RCU]
ksoftirqd/4-31    [004]   100.438537: softirq_exit:         vec=9 [action=RCU]
ksoftirqd/4-31    [004]   100.438540: sched_switch:         ksoftirqd/4:31 [120] S ==> sysjitter:2377 [120]
---

And tick again fires below.

---
  sysjitter-2377  [004]   100.439509: local_timer_entry:    vector=236
  sysjitter-2377  [004]   100.439511: hrtimer_cancel:       hrtimer=0xffff92e9df91fd20
  sysjitter-2377  [004]   100.439512: hrtimer_expire_entry: hrtimer=0xffff92e9df91fd20 now=92631515408 function=tick_sched_timer/0x0
  sysjitter-2377  [004]   100.439514: kernel_stack:         <stack trace>
=> __ftrace_trace_stack (ffffffffa37da921)
=> __raise_softirq_irqoff (ffffffffa36daf50)
=> raise_softirq (ffffffffa36db0fe)
=> update_process_times (ffffffffa3768f9d)
=> tick_sched_handle (ffffffffa377aaa2)
=> tick_sched_timer (ffffffffa377ad53)
=> __hrtimer_run_queues (ffffffffa3769cf0)
=> hrtimer_interrupt (ffffffffa376a58a)
=> smp_apic_timer_interrupt (ffffffffa40028f8)
=> apic_timer_interrupt (ffffffffa4001b7f)
sysjitter-2377  [004]   100.439515: softirq_raise:        vec=1 [action=TIMER]
sysjitter-2377  [004]   100.439517: kernel_stack:         <stack trace>
=> __ftrace_trace_stack (ffffffffa37da921)
=> __raise_softirq_irqoff (ffffffffa36daf50)
=> raise_softirq (ffffffffa36db0fe)
=> trigger_load_balance (ffffffffa371cb9c)
=> update_process_times (ffffffffa3768fc7)
=> tick_sched_handle (ffffffffa377aaa2)
=> tick_sched_timer (ffffffffa377ad53)
=> __hrtimer_run_queues (ffffffffa3769cf0)
=> hrtimer_interrupt (ffffffffa376a58a)
=> smp_apic_timer_interrupt (ffffffffa40028f8)
=> apic_timer_interrupt (ffffffffa4001b7f)
  sysjitter-2377  [004]   100.439518: softirq_raise:        vec=7 [action=SCHED]
  sysjitter-2377  [004]   100.439520: hrtimer_expire_exit:  hrtimer=0xffff92e9df91fd20
  sysjitter-2377  [004]   100.439521: hrtimer_start:        hrtimer=0xffff92e9df91fd20 function=tick_sched_timer/0x0 expires=92632512937 softexpires=92632512937
  sysjitter-2377  [004]   100.439522: local_timer_exit:     vector=236
  sysjitter-2377  [004]   100.439525: sched_wakeup:         ksoftirqd/4:31 [120] success=1 CPU:004
  sysjitter-2377  [004]   100.439527: tick_stop:            success=0 dependency=SCHED
  sysjitter-2377  [004]   100.439529: irq_work_entry:       vector=246
  sysjitter-2377  [004]   100.439531: irq_work_exit:        vector=246
  sysjitter-2377  [004]   100.439532: tick_stop:            success=0 dependency=SCHED
  sysjitter-2377  [004]   100.439534: sched_switch:         sysjitter:2377 [120] R ==> ksoftirqd/4:31 [120]
ksoftirqd/4-31    [004]   100.439536: softirq_entry:        vec=1 [action=TIMER]
ksoftirqd/4-31    [004]   100.439537: softirq_exit:         vec=1 [action=TIMER]
ksoftirqd/4-31    [004]   100.439538: softirq_entry:        vec=7 [action=SCHED]
ksoftirqd/4-31    [004]   100.439540: softirq_exit:         vec=7 [action=SCHED]
ksoftirqd/4-31    [004]   100.439541: sched_switch:         ksoftirqd/4:31 [120] S ==> sysjitter:2377 [120]
---

History seems to repeat itself until ~2s later (this is one of the
pathological runs) when an ipi fires on CPU4 and SCHED dependecy is seen
as cleared (as only sysjitter thread was is running at that point).

Does this make sense and help in any way? Suggestions for debugging this
further? :-)

Thanks!

Juri

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ