lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <547E4C14.6040509@oracle.com>
Date:	Tue, 02 Dec 2014 18:32:36 -0500
From:	Sasha Levin <sasha.levin@...cle.com>
To:	Dave Jones <davej@...hat.com>, Chris Mason <clm@...com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Dâniel Fraga <fragabr@...il.com>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: frequent lockups in 3.18rc4

On 12/02/2014 02:32 PM, Dave Jones wrote:
> On Mon, Dec 01, 2014 at 06:08:38PM -0500, Chris Mason wrote:
>  > I'm not sure if this is related, but running trinity here, I noticed it
>  > was stuck at 100% system time on every CPU.  perf report tells me we are
>  > spending all of our time in spin_lock under the sync system call.
>  > 
>  > I think it's coming from contention in the bdi_queue_work() call from
>  > inside sync_inodes_sb, which is spin_lock_bh(). 
>  > 
>  > I wonder if we're just spinning so hard on this one bh lock that we're
>  > starving the watchdog?
>  > 
>  > Dave, do you have spinlock debugging on?  
> 
> That has been a constant, yes. I can try with that disabled some time.

Here's my side of the story: I was observing RCU lockups which went away when
I disabled verbose printing for fault injections. It seems that printing one
line ~10 times a second can cause that...

I've disabled lock debugging to see if anything new will show up, and hit
something that may be related:

[  787.894288] ================================================================================
[  787.897074] UBSan: Undefined behaviour in kernel/sched/fair.c:4541:17
[  787.898981] signed integer overflow:
[  787.900066] 361516561629678 * 101500 cannot be represented in type 'long long int'
[  787.900066] CPU: 18 PID: 12958 Comm: trinity-c103 Not tainted 3.18.0-rc6-next-20141201-sasha-00070-g028060a-dirty #1528
[  787.900066]  0000000000000000 0000000000000000 ffffffff93b0f890 ffff8806e3eff918
[  787.900066]  ffffffff91f1cf26 1ffffffff3c2de73 ffffffff93b0f8a8 ffff8806e3eff938
[  787.900066]  ffffffff91f1fb90 1ffffffff3c2de73 ffffffff93b0f8a8 ffff8806e3eff9f8
[  787.900066] Call Trace:
[  787.900066] dump_stack (lib/dump_stack.c:52)
[  787.900066] ubsan_epilogue (lib/ubsan.c:159)
[  787.900066] handle_overflow (lib/ubsan.c:191)
[  787.900066] ? __do_page_fault (arch/x86/mm/fault.c:1220)
[  787.900066] ? local_clock (kernel/sched/clock.c:392)
[  787.900066] __ubsan_handle_mul_overflow (lib/ubsan.c:218)
[  787.900066] select_task_rq_fair (kernel/sched/fair.c:4541 kernel/sched/fair.c:4755)
[  787.900066] try_to_wake_up (kernel/sched/core.c:1415 kernel/sched/core.c:1724)
[  787.900066] ? trace_hardirqs_on_thunk (arch/x86/lib/thunk_64.S:33)
[  787.900066] default_wake_function (kernel/sched/core.c:2979)
[  787.900066] ? get_parent_ip (kernel/sched/core.c:2559)
[  787.900066] autoremove_wake_function (kernel/sched/wait.c:295)
[  787.900066] ? get_parent_ip (kernel/sched/core.c:2559)
[  787.900066] __wake_up_common (kernel/sched/wait.c:73)
[  787.900066] __wake_up_sync_key (include/linux/spinlock.h:364 kernel/sched/wait.c:146)
[  787.900066] pipe_write (fs/pipe.c:466)
[  787.900066] ? kasan_poison_shadow (mm/kasan/kasan.c:48)
[  787.900066] ? new_sync_read (fs/read_write.c:480)
[  787.900066] do_iter_readv_writev (fs/read_write.c:681)
[  787.900066] compat_do_readv_writev (fs/read_write.c:1029)
[  787.900066] ? wait_for_partner (fs/pipe.c:340)
[  787.900066] ? _raw_spin_unlock (./arch/x86/include/asm/preempt.h:95 include/linux/spinlock_api_smp.h:152 kernel/locking/spinlock.c:183)
[  787.900066] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
[  787.900066] ? syscall_trace_enter_phase1 (include/linux/context_tracking.h:27 arch/x86/kernel/ptrace.c:1486)
[  787.900066] compat_writev (fs/read_write.c:1145)
[  787.900066] compat_SyS_writev (fs/read_write.c:1163 fs/read_write.c:1151)
[  787.900066] ia32_do_call (arch/x86/ia32/ia32entry.S:446)
[  787.900066] ================================================================================

(For Linus asking himself "what the hell is this UBSan thing, I didn't merge that!" - it's an
undefined behaviour sanitizer that works with gcc5.)


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ