lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190701190949.GB4336@minyard.net>
Date:   Mon, 1 Jul 2019 14:09:49 -0500
From:   Corey Minyard <minyard@....org>
To:     Steven Rostedt <rostedt@...dmis.org>
Cc:     Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        linux-rt-users@...r.kernel.org, linux-kernel@...r.kernel.org,
        Peter Zijlstra <peterz@...radead.org>, tglx@...utronix.de,
        Corey Minyard <cminyard@...sta.com>
Subject: Re: [PATCH RT v2] Fix a lockup in wait_for_completion() and friends

On Fri, Jun 28, 2019 at 09:49:03PM -0400, Steven Rostedt wrote:
> On Fri, 10 May 2019 12:33:18 +0200
> Sebastian Andrzej Siewior <bigeasy@...utronix.de> wrote:
> 
> > On 2019-05-09 14:33:20 [-0500], minyard@....org wrote:
> > > From: Corey Minyard <cminyard@...sta.com>
> > > 
> > > The function call do_wait_for_common() has a race condition that
> > > can result in lockups waiting for completions.  Adding the thread
> > > to (and removing the thread from) the wait queue for the completion
> > > is done outside the do loop in that function.  However, if the thread
> > > is woken up, the swake_up_locked() function will delete the entry
> > > from the wait queue.  If that happens and another thread sneaks
> > > in and decrements the done count in the completion to zero, the
> > > loop will go around again, but the thread will no longer be in the
> > > wait queue, so there is no way to wake it up.  
> > 
> > applied, thank you.
> > 
> 
> When I applied this patch to 4.19-rt, I get the following lock up:

I was unable to reproduce, and I looked at the code and I can't really
see a connection between this change and this crash.

Can you reproduce at will?  If so, can you send a testcase?

-corey

> 
> watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [sh:745]
> Modules linked in: floppy i915 drm_kms_helper drm fb_sys_fops sysimgblt sysfillrect syscopyarea iosf_mbi i2c_algo_bit video
> CPU: 2 PID: 745 Comm: sh Not tainted 4.19.56-test-rt23+ #16
> Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
> RIP: 0010:_raw_spin_unlock_irq+0x17/0x4d
> Code: 48 8b 12 0f ba e2 12 73 07 e8 f1 4a 92 ff 31 c0 5b 5d c3 66 66 66 66 90 55 48 89 e5 c6 07 00 e8 de 3d a3 ff fb bf 01 00 00 00 <e8> a7 27 9a ff 65 8b 05 c8 7f 93 7e 85 c0 74 1f a9 ff ff
>  ff 7f 75
> RSP: 0018:ffffc90000c8bbb8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
> RAX: 0000000000000000 RBX: ffffc90000c8bd58 RCX: 0000000000000003
> RDX: 0000000000000000 RSI: ffffffff8108ffab RDI: 0000000000000001
> RBP: ffffc90000c8bbb8 R08: ffffffff816dcd76 R09: 0000000000020600
> R10: 0000000000000400 R11: 0000001c0eef1808 R12: ffffc90000c8bbc8
> R13: ffffc90000f13ca0 R14: ffff888074b2d7d8 R15: ffff8880789efe10
> FS:  0000000000000000(0000) GS:ffff88807b300000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00000030662001b8 CR3: 00000000376ac000 CR4: 00000000000006e0
> Call Trace:
>  swake_up_all+0xa6/0xde
>  __d_lookup_done+0x7c/0xc7
>  __d_add+0x44/0xf7
>  d_splice_alias+0x208/0x218
>  ext4_lookup+0x1a6/0x1c5
>  path_openat+0x63a/0xb15
>  ? preempt_latency_stop+0x25/0x27
>  do_filp_open+0x51/0xae
>  ? trace_preempt_on+0xde/0xe7
>  ? rt_spin_unlock+0x13/0x24
>  ? __alloc_fd+0x145/0x155
>  do_sys_open+0x81/0x125
>  __x64_sys_open+0x21/0x23
>  do_syscall_64+0x5c/0x6e
>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> I haven't really looked too much into it though. I ran out of time :-/
> 
> -- Steve

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ