lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Tue, 7 Mar 2017 13:44:59 -0500 From: Paul Moore <paul@...l-moore.com> To: Richard Guy Briggs <rgb@...hat.com> Cc: Cong Wang <xiyou.wangcong@...il.com>, Herbert Xu <herbert@...dor.apana.org.au>, netdev <netdev@...r.kernel.org>, Alexei Starovoitov <ast@...nel.org>, LKML <linux-kernel@...r.kernel.org>, Eric Dumazet <edumazet@...gle.com>, syzkaller <syzkaller@...glegroups.com>, linux-audit@...hat.com, David Miller <davem@...emloft.net>, Dmitry Vyukov <dvyukov@...gle.com> Subject: Re: netlink: GPF in netlink_unicast On Tue, Mar 7, 2017 at 10:55 AM, Richard Guy Briggs <rgb@...hat.com> wrote: > On 2017-03-07 09:29, Paul Moore wrote: >> On Mon, Mar 6, 2017 at 11:03 PM, Richard Guy Briggs <rgb@...hat.com> wrote: >> > On 2017-03-06 10:10, Cong Wang wrote: >> >> On Mon, Mar 6, 2017 at 2:54 AM, Dmitry Vyukov <dvyukov@...gle.com> wrote: >> >> > Hello, >> >> > >> >> > I've got the following crash while running syzkaller fuzzer on >> >> > net-next/8d70eeb84ab277377c017af6a21d0a337025dede: >> >> > >> >> > kasan: GPF could be caused by NULL-ptr deref or user memory access >> >> > general protection fault: 0000 [#1] SMP KASAN >> >> > Dumping ftrace buffer: >> >> > (ftrace buffer empty) >> >> > Modules linked in: >> >> > CPU: 0 PID: 883 Comm: kauditd Not tainted 4.10.0+ #6 >> >> > Hardware name: Google Google Compute Engine/Google Compute Engine, >> >> > BIOS Google 01/01/2011 >> >> > task: ffff8801d79f0240 task.stack: ffff8801d7a20000 >> >> > RIP: 0010:sock_sndtimeo include/net/sock.h:2162 [inline] >> >> > RIP: 0010:netlink_unicast+0xdd/0x730 net/netlink/af_netlink.c:1249 >> >> > RSP: 0018:ffff8801d7a27c38 EFLAGS: 00010206 >> >> > RAX: 0000000000000056 RBX: ffff8801d7a27cd0 RCX: 0000000000000000 >> >> > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000000002b0 >> >> > RBP: ffff8801d7a27cf8 R08: ffffed00385cf286 R09: ffffed00385cf286 >> >> > R10: 0000000000000006 R11: ffffed00385cf285 R12: 0000000000000000 >> >> > R13: dffffc0000000000 R14: ffff8801c2fc3c80 R15: 00000000014000c0 >> >> > FS: 0000000000000000(0000) GS:ffff8801dbe00000(0000) knlGS:0000000000000000 >> >> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> >> > CR2: 0000000020cfd000 CR3: 00000001c758f000 CR4: 00000000001406f0 >> >> > Call Trace: >> >> > kauditd_send_unicast_skb+0x3c/0x70 kernel/audit.c:482 >> >> > kauditd_thread+0x174/0xb00 kernel/audit.c:599 >> >> > kthread+0x326/0x3f0 kernel/kthread.c:229 >> >> > ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430 >> >> > Code: 44 89 fe e8 56 15 ff ff 8b 8d 70 ff ff ff 49 89 c6 31 c0 85 c9 >> >> > 75 27 e8 b2 b2 f4 fd 49 8d bc 24 b0 02 00 00 48 89 f8 48 c1 e8 03 <42> >> >> > 80 3c 28 00 0f 85 37 06 00 00 49 8b 84 24 b0 02 00 00 4c 8d >> >> > RIP: sock_sndtimeo include/net/sock.h:2162 [inline] RSP: ffff8801d7a27c38 >> >> > RIP: netlink_unicast+0xdd/0x730 net/netlink/af_netlink.c:1249 RSP: >> >> > ffff8801d7a27c38 >> >> > ---[ end trace ad1bba9d457430b6 ]--- >> >> > Kernel panic - not syncing: Fatal exception >> >> > >> >> > >> >> > This is not reproducible and seems to be caused by an elusive race. >> >> > However, looking at the code I don't see any proper protection of >> >> > audit_sock (other than the if (!audit_pid) which is obviously not >> >> > enough to protect against races). >> >> >> >> audit_cmd_mutex is supposed to protect it, I think. >> >> But kauditd_send_unicast_skb() seems not holding this mutex. >> > >> > Hmmmm, I wonder if it makes sense to wrap most of the contents of the >> > outer while loop in kauditd_thread in the audit_cmd_mutex, or around the >> > first two innter while loops and the "if (auditd)" condition after the >> > "quick_loop:" label. The condition on auditd is supposed to catch that >> > case. We don't want it locked while playing with the scheduler at the >> > bottom of that function. >> >> Let me look into this and play around with a few things. I suspected >> there might be a problem here, so I've got thoughts on how we might >> resolve it; I just need to see code them up and see what option sucks >> the least. >> >> FWIW Richard, yes wrapping most of kauditd_thread *should* resolve >> this but it's pretty heavy handed and not my first choice. > > That's why the inner loops made a bit more sense since it wasn't really > necessary and ran afoul of the scheduler anyways. One of my preferred options was to get us away from protecting everything with the audit_cmd_mutex by creating a new locking approach for the auditd connection state (using RCU/spinlocks since it rarely changes in practice) and leaving the audit_cmd_mutex for it's traditional role. This should minimize the performance impact of the lock and clean things up a bit. I'm also moving all the auditd connection state into a single struct (instead of several variables associated only by convention) which moves us oh so slightly closer to allowing multiple auditd connections (hey, it's something). It's taking a bit longer than expected as I'm dealing with a bit of a head cold (or something) and my mind is far less than 100% at the moment ... -- paul moore www.paul-moore.com
Powered by blists - more mailing lists