linux-kernel - Re: [rcu_sched stall] regression/miss-config ?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Thu, 19 May 2016 16:45:26 -0700
From:	Santosh Shilimkar <santosh.shilimkar@...cle.com>
To:	paulmck@...ux.vnet.ibm.com
Cc:	linux-kernel@...r.kernel.org, Sasha Levin <sasha.levin@...cle.com>
Subject: Re: [rcu_sched stall] regression/miss-config ?

Hi Paul,

On 5/17/2016 12:15 PM, Paul E. McKenney wrote:
> On Tue, May 17, 2016 at 06:46:22AM -0700, santosh.shilimkar@...cle.com wrote:
>> On 5/16/16 5:58 PM, Paul E. McKenney wrote:
>>> On Mon, May 16, 2016 at 12:49:41PM -0700, Santosh Shilimkar wrote:
>>>> On 5/16/2016 10:34 AM, Paul E. McKenney wrote:
>>>>> On Mon, May 16, 2016 at 09:33:57AM -0700, Santosh Shilimkar wrote:
>>
>> [...]
>>
>>>>> Are you running CONFIG_NO_HZ_FULL=y?  If so, the problem might be that
>>>>> you need more housekeeping CPUs than you currently have configured.
>>>>>
>>>> Yes, CONFIG_NO_HZ_FULL=y. Do you mean "CONFIG_NO_HZ_FULL_ALL=y" for
>>>> book keeping. Seems like without that clock-event code will just use
>>>> CPU0 for things like broadcasting which might become bottleneck.
>>>> This could explain connect the hrtimer_interrupt() path getting slowed
>>>> down because of book keeping bottleneck.
>>>>
>>>> $cat .config | grep NO_HZ
>>>> CONFIG_NO_HZ_COMMON=y
>>>> # CONFIG_NO_HZ_IDLE is not set
>>>> CONFIG_NO_HZ_FULL=y
>>>> # CONFIG_NO_HZ_FULL_ALL is not set
>>>> # CONFIG_NO_HZ_FULL_SYSIDLE is not set
>>>> CONFIG_NO_HZ=y
>>>> # CONFIG_RCU_FAST_NO_HZ is not set
>>>
>>> Yes, CONFIG_NO_HZ_FULL_ALL=y would give you only one CPU for all
>>> housekeeping tasks, including the RCU grace-period kthreads.  So you are
>>> booting without any nohz_full boot parameter?  You can end up with the
>>> same problem with CONFIG_NO_HZ_FULL=y and the nohz_full boot parameter
>>> that you can with CONFIG_NO_HZ_FULL_ALL=y.
>>>
>> I see. Yes, the systems are booting without nohz_full boot parameter.
>> Will try to add more CPUs to it & update the thread
>> after the verification since it takes time to reproduce the issue.
>>
>> Thanks for discussion so far Paul. Its very insightful for me.
>
> Please let me know how things go with further testing, especially with
> the priority setting.
>
Sorry for delay. I manage to get information about XEN usecase
custom config as discussed above. To reduce variables, I disabled 
"CONFIG_NO_HZ_FULL" altogether. So the effective setting was:

CONFIG_NO_HZ_IDLE=y
# CONFIG_NO_HZ_FULL is not set
CONFIG_TREE_RCU_TRACE=y
CONFIG_RCU_KTHREAD_PRIO=1
CONFIG_RCU_CPU_STALL_TIMEOUT=21
CONFIG_RCU_TRACE=y

Unfortunately the XEN test still failed. Log end of
the email. This test(s) is bit peculiar though since
its database running in VM with 1 or 2 CPUs. One of
the suspect is because the database RT processes are
hogging the CPU(s), kernel RCU thread is not getting chance
to run which eventually results in stall. Does it
make sense ?

Please note that its non-preempt kernel using RT processes. ;-)

# cat .config | grep PREEMPT
CONFIG_PREEMPT_NOTIFIERS=y
# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set

Regards,
Santosh
...
....
rcu_sched kthread starved for 399032 jiffies!
INFO: rcu_sched detected stalls on CPUs/tasks: {} (detected by 0, 
t=462037 jiffies, g=118888, c=118887, q=0)
All QSes seen, last rcu_sched kthread activity 462037 
(4296277632-4295815595), jiffies_till_next_fqs=3, root ->qsmask 0x0
ocssd.bin       R  running task        0 15375      1 0x00000000
  0000000000000000 ffff8800ec003bc8 ffffffff810a8581 ffffffff81abf980
  000000000001d068 ffff8800ec003c28 ffffffff810e9c98 0000000000000000
  0000000000000086 0000000000000000 0000000000000086 0000000000000082
Call Trace:
  <IRQ>  [<ffffffff810a8581>] sched_show_task+0xb1/0x120
  [<ffffffff810e9c98>] print_other_cpu_stall+0x288/0x2d0
  [<ffffffff810e9e60>] __rcu_pending+0x180/0x230
  [<ffffffff810e9fa5>] rcu_check_callbacks+0x95/0x140
  [<ffffffff810edf92>] update_process_times+0x42/0x70
  [<ffffffff810ffc79>] tick_sched_handle+0x39/0x80
  [<ffffffff810ffee2>] tick_sched_timer+0x52/0xa0
  [<ffffffff810f0424>] __run_hrtimer+0x74/0x1d0
  [<ffffffff810ffe90>] ? tick_nohz_handler+0xc0/0xc0
  [<ffffffff810f07b2>] hrtimer_interrupt+0x102/0x240
  [<ffffffff8100aaae>] xen_timer_interrupt+0x2e/0x130
  [<ffffffff8143347a>] ? add_interrupt_randomness+0x3a/0x1f0
  [<ffffffff81384290>] ? store_cursor_blink+0xc0/0xc0
  [<ffffffff810d9de4>] handle_irq_event_percpu+0x54/0x1b0
  [<ffffffff810dd477>] handle_percpu_irq+0x47/0x70
  [<ffffffff810d9d27>] generic_handle_irq+0x27/0x40
  [<ffffffff813e53ca>] evtchn_2l_handle_events+0x25a/0x260
  [<ffffffff81085541>] ? __do_softirq+0x191/0x2f0
  [<ffffffff813e2c2f>] __xen_evtchn_do_upcall+0x4f/0x90
  [<ffffffff813e3404>] xen_evtchn_do_upcall+0x34/0x50
  [<ffffffff816c761e>] xen_hvm_callback_vector+0x6e/0x80
  <EOI>
rcu_sched kthread starved for 462037 jiffies!