linux-kernel - Re: Question concerning RCU

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Sun, 11 Jan 2015 12:26:04 -0800
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	"Stoidner, Christoph" <c.stoidner@...ero.de>
Cc:	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: Question concerning RCU

On Sun, Jan 11, 2015 at 11:59:45AM +0000, Stoidner, Christoph wrote:
> 
> Hi Paul,
> 
> many thanks for your fast answer!
> 
> Now I have changed my application in that way, that it does not require 
> Xenomai/I-Pipe anymore. That means my kernel is build now from 
> mainline source, with preempt_rt only and no Xenomai or I-Pipe. 
> However the problem is exact the same. After some runtime (minutes 
> or hours) the kernel freezes and JTAG debugging shows that it ends-up 
> in an endless loop in rcu_print_task_stall (as described before). 
> 
> > First I have seen this.  Were you doing lots of CPU-hotplug operations?
> 
> My system has only one core. So I think there should not be any 
> CPU-hotplugging.

OK, so no point in providing you that set of patches, then.

> > If you have more CPUs than the value of CONFIG_RCU_FANOUT (which
> > defaults to 16), and if your workload offlined a full block of CPUs (full
> > blocks being CPUs 0-15, 16-31, 32-47, and so on for the default value
> > of CONFIG_RCU_FANOUT), then there is a theoretical issue that -might-
> > cause the problem that you are seeing.
> 
> Also this could not only happen on a single core system. Am I right?

Yep, no way this can happen without a lot of CPUs and a lot of CPU
hotplugging.

> I have no idea how to find the problem. Do you have any more hints or ideas?

You got stack traces with the stall warnings, correct?  If so, please look
at them and at Documentation/RCU/stallwarn.txt and see if the kernel is
looping somewhere inappropriate.

I am not familiar with the low-level ARM kernel code, but the stack below
leads me to suspect that your kernel is interrupting itself to death or
is improperly handling interrupts.

							Thanx, Paul

> Here is a backtrace when the problem has occurred on the system without Xenomai/I-Pipe:
> 
> #0  rcu_print_task_stall (rnp=0xc0498dc8 <rcu_preempt_state>) at kernel/rcutree_plugin.h:528
> #1  0xc005cabc in print_other_cpu_stall (rsp=0xc0498dc8 <rcu_preempt_state>) at kernel/rcutree.c:885
> #2  check_cpu_stall (rdp=0x80000093, rsp=0xc0498dc8 <rcu_preempt_state>) at kernel/rcutree.c:977
> #3  __rcu_pending (rdp=0x80000093, rsp=0xc0498dc8 <rcu_preempt_state>) at kernel/rcutree.c:2750
> #4  rcu_pending (cpu=<optimized out>) at kernel/rcutree.c:2800
> #5  rcu_check_callbacks (cpu=<optimized out>, user=<optimized out>) at kernel/rcutree.c:2179
> #6  0xc0027648 in update_process_times (user_tick=0) at kernel/timer.c:1427
> #7  0xc004e840 in tick_sched_timer (timer=0xc0498860 <tick_cpu_sched>) at kernel/time/tick-sched.c:1095
> #8  0xc003a0dc in __run_hrtimer (timer=0xc0498860 <tick_cpu_sched>, now=<optimized out>) at kernel/hrtimer.c:1363
> #9  0xc003ab4c in hrtimer_interrupt (dev=<optimized out>) at kernel/hrtimer.c:1582
> #10 0xc02bf7bc in mxs_timer_interrupt (irq=<optimized out>, dev_id=<optimized out>) at drivers/clocksource/mxs_timer.c:132
> #11 0xc0055154 in handle_irq_event_percpu (desc=0xc7804c00, action=0xc04b0520 <mxs_timer_irq>) at kernel/irq/handle.c:144
> #12 0xc0055320 in handle_irq_event (desc=0xc7804c00) at kernel/irq/handle.c:197
> #13 0xc00578b8 in handle_level_irq (irq=<optimized out>, desc=0xc7804c00) at kernel/irq/chip.c:406
> #14 0xc0054aec in generic_handle_irq_desc (desc=<optimized out>, irq=16) at include/linux/irqdesc.h:115
> #15 generic_handle_irq (irq=16) at kernel/irq/irqdesc.c:314
> #16 0xc000f58c in handle_IRQ (irq=16, regs=<optimized out>) at arch/arm/kernel/irq.c:80
> #17 0xc000e360 in __irq_svc () at arch/arm/kernel/entry-armv.S:202
> #18 0xc000e360 in __irq_svc () at arch/arm/kernel/entry-armv.S:202
> #19 0xc000e360 in __irq_svc () at arch/arm/kernel/entry-armv.S:202
> #20 0xc000e360 in __irq_svc () at arch/arm/kernel/entry-armv.S:202
> ...
> 
> Thanks and regards,
> Christoph
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/