lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <CACT4Y+bOd17CLKm742_t06p9yweh1x2rZswc_ghtmnuE7+Zb5w@mail.gmail.com> Date: Wed, 27 Dec 2017 21:14:21 +0100 From: Dmitry Vyukov <dvyukov@...gle.com> To: Ozgur <ozgur@...sey.org> Cc: Tom Herbert <tom@...bertland.com>, John Fastabend <john.fastabend@...il.com>, syzbot <bot+c91c53af67f9ebe599a337d2e70950366153b295@...kaller.appspotmail.com>, "David S. Miller" <davem@...emloft.net>, Eric Biggers <ebiggers@...gle.com>, LKML <linux-kernel@...r.kernel.org>, Linux Kernel Network Developers <netdev@...r.kernel.org>, "syzkaller-bugs@...glegroups.com" <syzkaller-bugs@...glegroups.com>, Tom Herbert <tom@...ntonium.net>, Cong Wang <xiyou.wangcong@...il.com> Subject: Re: WARNING in strp_data_ready On Wed, Dec 27, 2017 at 9:08 PM, Ozgur <ozgur@...sey.org> wrote: > > > 27.12.2017, 22:21, "Dmitry Vyukov" <dvyukov@...gle.com>: >> On Wed, Dec 27, 2017 at 8:09 PM, Tom Herbert <tom@...bertland.com> wrote: >>> Did you try the patch I posted? >> >> Hi Tom, > > Hello Dmitry, > >> No. And I didn't know I need to. Why? >> If you think the patch needs additional testing, you can ask syzbot to >> test it. See https://github.com/google/syzkaller/blob/master/docs/syzbot.md#communication-with-syzbot >> Otherwise proceed with committing it. Or what are we waiting for? >> >> Thanks > > I think we need to fixed patch for crash, in fact check to patch code and test solve the bug. > How do test it because there is no patch in the following bug? Hi Ozgur, I am not sure I completely understand what you mean. But the reproducer for this bug (which one can use for testing) is here: https://groups.google.com/forum/#!topic/syzkaller-bugs/Kxs05ziCpgY Tom also mentions there is some patch for this, but I don't know where it is, it doesn't seem to be referenced from this thread. > The fix patch should be for this net/kcm/kcmsock.c file and lock functions must be added calling sk_data_ready (). > Regards > > Ozgur > >>> On Wed, Dec 27, 2017 at 10:25 AM, Dmitry Vyukov <dvyukov@...gle.com> wrote: >>>> On Wed, Dec 6, 2017 at 4:44 PM, Dmitry Vyukov <dvyukov@...gle.com> wrote: >>>>>> <john.fastabend@...il.com> wrote: >>>>>>> On 10/24/2017 08:20 AM, syzbot wrote: >>>>>>>> Hello, >>>>>>>> >>>>>>>> syzkaller hit the following crash on 73d3393ada4f70fa3df5639c8d438f2f034c0ecb >>>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master >>>>>>>> compiler: gcc (GCC) 7.1.1 20170620 >>>>>>>> .config is attached >>>>>>>> Raw console output is attached. >>>>>>>> C reproducer is attached >>>>>>>> syzkaller reproducer is attached. See https://goo.gl/kgGztJ >>>>>>>> for information about syzkaller reproducers >>>>>>>> >>>>>>>> WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 sock_owned_by_me include/net/sock.h:1505 [inline] >>>>>>>> WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 sock_owned_by_user include/net/sock.h:1511 [inline] >>>>>>>> WARNING: CPU: 0 PID: 2996 at ./include/net/sock.h:1505 strp_data_ready+0x2b7/0x390 net/strparser/strparser.c:404 >>>>>>>> Kernel panic - not syncing: panic_on_warn set ... >>>>>>>> >>>>>>>> CPU: 0 PID: 2996 Comm: syzkaller142210 Not tainted 4.14.0-rc5+ #138 >>>>>>>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 >>>>>>>> Call Trace: >>>>>>>> <IRQ> >>>>>>>> __dump_stack lib/dump_stack.c:16 [inline] >>>>>>>> dump_stack+0x194/0x257 lib/dump_stack.c:52 >>>>>>>> panic+0x1e4/0x417 kernel/panic.c:181 >>>>>>>> __warn+0x1c4/0x1d9 kernel/panic.c:542 >>>>>>>> report_bug+0x211/0x2d0 lib/bug.c:183 >>>>>>>> fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:178 >>>>>>>> do_trap_no_signal arch/x86/kernel/traps.c:212 [inline] >>>>>>>> do_trap+0x260/0x390 arch/x86/kernel/traps.c:261 >>>>>>>> do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:298 >>>>>>>> do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:311 >>>>>>>> invalid_op+0x18/0x20 arch/x86/entry/entry_64.S:905 >>>>>>>> RIP: 0010:sock_owned_by_me include/net/sock.h:1505 [inline] >>>>>>>> RIP: 0010:sock_owned_by_user include/net/sock.h:1511 [inline] >>>>>>>> RIP: 0010:strp_data_ready+0x2b7/0x390 net/strparser/strparser.c:404 >>>>>>>> RSP: 0018:ffff8801db206b18 EFLAGS: 00010206 >>>>>>>> RAX: ffff8801d1e02080 RBX: ffff8801dad74c48 RCX: 0000000000000000 >>>>>>>> RDX: 0000000000000100 RSI: ffff8801d29fa0a0 RDI: ffffffff85cbede0 >>>>>>>> RBP: ffff8801db206b38 R08: 0000000000000005 R09: 1ffffffff0ce0bcd >>>>>>>> R10: ffff8801db206a00 R11: dffffc0000000000 R12: ffff8801d29fa000 >>>>>>>> R13: ffff8801dad74c50 R14: ffff8801d4350a92 R15: 0000000000000001 >>>>>>>> psock_data_ready+0x56/0x70 net/kcm/kcmsock.c:353 >>>>>>> >>>>>>> Looks like KCM is calling sk_data_ready() without first taking the >>>>>>> sock lock. >>>>>>> >>>>>>> /* Called with lower sock held */ >>>>>>> static void kcm_rcv_strparser(struct strparser *strp, struct sk_buff *skb) >>>>>>> { >>>>>>> [...] >>>>>>> if (kcm_queue_rcv_skb(&kcm->sk, skb)) { >>>>>>> >>>>>>> In this case kcm->sk is not the same lock the comment is referring to. >>>>>>> And kcm_queue_rcv_skb() will eventually call sk_data_ready(). >>>>>>> >>>>>>> @Tom, how about wrapping the sk_data_ready call in {lock|release}_sock? >>>>>>> I don't have anything better in mind immediately. >>>>>> The sock locks are taken in reverse order in the send path so so >>>>>> grabbing kcm sock lock with lower lock held to call sk_data_ready may >>>>>> lead to deadlock like I think. >>>>>> >>>>>> It might be possible to change the order in the send path to do this. >>>>>> Something like: >>>>>> >>>>>> trylock on lower socket lock >>>>>> -if trylock fails >>>>>> - release kcm sock lock >>>>>> - lock lower sock >>>>>> - lock kcm sock >>>>>> - call sendpage locked function >>>>>> >>>>>> I admit that dealing with two levels of socket locks in the data path >>>>>> is quite a pain :-) >>>>> >>>>> up >>>>> >>>>> still happening and we've lost 50K+ test VMs on this >>>> >>>> up >>>> >>>> Still happens and number of crashes crossed 60K, can we do something >>>> with this please?
Powered by blists - more mailing lists