[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <AM6PR04MB5639FBD246251DEF694AB02FF1319@AM6PR04MB5639.eurprd04.prod.outlook.com>
Date: Mon, 14 Jun 2021 11:12:03 +0000
From: David Mozes <david.mozes@...k.us>
To: Matthew Wilcox <willy@...radead.org>
CC: "linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Darren Hart <dvhart@...radead.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: futex/call -to plist_for_each_entry_safe with head=NULL
Thx Matthew
1) You are probably correct regarding the place the actual crash happened unless something happens in-betweens....
But that what the gdb told us in addition, the RDI shows us the value of 0x00000246.
Jun 10 20:49:40 c-node04 kernel: [97562.144463] BUG: unable to handle kernel NULL pointer dereference at 0000000000000246
Jun 10 20:49:40 c-node04 kernel: [97562.145450] PGD 2012ee4067 P4D 2012ee4067 PUD 20135a0067 PMD 0
Jun 10 20:49:40 c-node04 kernel: [97562.145450] Oops: 0000 [#1] SMP
Jun 10 20:49:40 c-node04 kernel: [97562.145450] CPU: 36 PID: 12668 Comm: STAR4BLKS0_WORK Kdump: loaded Tainted: G W OE 4.19.149-KM6 #1
Jun 10 20:49:40 c-node04 kram: rpoll(0x7fe624135b90, 85, 50) returning 0 times: 0, 0, 0, 2203, 0 ccount 42
Jun 10 20:49:40 c-node04 kernel: [97562.145450] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008 12/07/2018
Jun 10 20:49:40 c-node04 kernel: [97562.145450] RIP: 0010:do_futex+0xdf/0xa90
Jun 10 20:49:40 c-node04 kernel: [97562.145450] Code: 08 4c 8d 6d 08 48 8b 3a 48 8d 72 e8 49 39 d5 4c 8d 67 e8 0f 84 89 00 00 00 31 c0 44 89 3c 24 41 89 df 44 89 f3 41 89 c6 eb 16 <49> 8b 7c
24 18 49 8d 54 24 18 4c 89 e6 4c 39 ea 4c 8d 67 e8 74 58
Jun 10 20:49:40 c-node04 kernel: [97562.145450] RSP: 0018:ffff97f6ea8bbdf0 EFLAGS: 00010283
Jun 10 20:49:40 c-node04 kernel: [97562.145450] RAX: 00007f6db1a5d000 RBX: 0000000000000001 RCX: ffffa5530c5f0140
Jun 10 20:49:40 c-node04 kernel: [97562.145450] RDX: ffff97f6e4287d58 RSI: ffff97f6e4287d40 RDI: 0000000000000246
Jun 10 20:49:40 c-node04 kram: rpoll(0x7fe62414a860, 2, 50) returning 0 times: 0, 0, 0, 2191, 0 ccount 277
Jun 10 20:49:40 c-node04 kernel: [97562.145450] RBP: ffffa5530c5bd580 R08: 00007f6db1a5d9c0 R09: 0000000000000001
2) In addition, we got a second crash on the same function a few lines above the previous one
Jun 12 11:20:43 c-node06 kernel: [91837.319613] ? pointer+0x137/0x350
Jun 12 11:20:43 c-node06 kernel: [91837.319613] printk+0x58/0x6f
Jun 12 11:20:43 c-node06 kernel: [91837.319613] panic+0xce/0x238
Jun 12 11:20:43 c-node06 kernel: [91837.319613] ? do_futex+0xa3d/0xa90
Jun 12 11:20:43 c-node06 kernel: [91837.319613] __stack_chk_fail+0x15/0x20
Jun 12 11:20:43 c-node06 kernel: [91837.319613] do_futex+0xa3d/0xa90
Jun 12 11:20:43 c-node06 kernel: [91837.319613] ? plist_add+0xc1/0xf0
Jun 12 11:20:43 c-node06 kernel: [91837.319613] ? plist_add+0xc1/0xf0
Jun 12 11:20:43 c-node06 kernel: [91837.319613] ? plist_del+0x5f/0xb0
Jun 12 11:20:43 c-node06 kernel: [91837.319613] __schedule+0x243/0x830
Jun 12 11:20:43 c-node06 kernel: [91837.319613] ? schedule+0x28/0x80
Jun 12 11:20:43 c-node06 kernel: [91837.319613] ? exit_to_usermode_loop+0x57/0xe0
Jun 12 11:20:43 c-node06 kernel: [91837.319613] ? prepare_exit_to_usermode+0x70/0x90
Jun 12 11:20:43 c-node06 kernel: [91837.319613] ? retint_user+0x8/0x8
(gdb) l *do_futex+0xa3d
0xffffffff8113985d is in do_futex (kernel/futex.c:1742).
1737 if (!(flags & FLAGS_SHARED)) {
1738 cond_resched();
1739 goto retry_private;
1740 }
1741
1742 put_futex_key(&key2);
1743 put_futex_key(&key1);
1744 cond_resched();
1745 goto retry;
1746 }
(gdb)
Closer to the double_lock_hb(hb1, hb2) you mention.
Regarding running without proprietary modules, we didn't manage to reproduce, but we are getting half of the IO load while this problem happens
Thx
David
-----Original Message-----
From: Matthew Wilcox <willy@...radead.org>
Sent: Sunday, June 13, 2021 11:04 PM
To: David Mozes <david.mozes@...k.us>
Cc: linux-fsdevel@...r.kernel.org; Thomas Gleixner <tglx@...utronix.de>; Ingo Molnar <mingo@...hat.com>; Peter Zijlstra <peterz@...radead.org>; Darren Hart <dvhart@...radead.org>; linux-kernel@...r.kernel.org
Subject: Re: futex/call -to plist_for_each_entry_safe with head=NULL
On Sun, Jun 13, 2021 at 12:24:52PM +0000, David Mozes wrote:
> Hi *,
> Under a very high load of io traffic, we got the below BUG trace.
> We can see that:
> plist_for_each_entry_safe(this, next, &hb1->chain, list) {
> if (match_futex (&this->key, &key1))
>
> were called with hb1 = NULL at futex_wake_up function.
> And there is no protection on the code regarding such a scenario.
>
> The NULL can be geting from:
> hb1 = hash_futex(&key1);
>
> How can we protect against such a situation?
Can you reproduce it without loading proprietary modules?
Your analysis doesn't quite make sense:
hb1 = hash_futex(&key1);
hb2 = hash_futex(&key2);
retry_private:
double_lock_hb(hb1, hb2);
If hb1 were NULL, then the oops would come earlier, in double_lock_hb().
> RIP: 0010:do_futex+0xdf/0xa90
>
> 0xffffffff81138eff is in do_futex (kernel/futex.c:1748).
> 1743 put_futex_key(&key1);
> 1744 cond_resched();
> 1745 goto retry;
> 1746 }
> 1747
> 1748 plist_for_each_entry_safe(this, next, &hb1->chain, list) {
> 1749 if (match_futex (&this->key, &key1)) {
> 1750 if (this->pi_state || this->rt_waiter) {
> 1751 ret = -EINVAL;
> 1752 goto out_unlock;
> (gdb)
>
>
>
> plist_for_each_entry_safe(this, next, &hb1->chain, list) {
> if (match_futex (&this->key, &key1)) {
>
>
>
>
> This happened in kernel 4.19.149 running on Azure vm
>
>
> Thx
> David
> Reply
> Forward
> MO
>
Powered by blists - more mailing lists