linux-kernel - Re: KMSAN: uninit-value in memcmp (2)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACT4Y+a8_4=dmZavdmZjPOraqYAtjBB228TjurbwLKSgJ9+J7Q@mail.gmail.com>
Date:   Mon, 24 Sep 2018 11:39:08 +0200
From:   Dmitry Vyukov <dvyukov@...gle.com>
To:     Alexander Potapenko <glider@...gle.com>
Cc:     Vladis Dronov <vdronov@...hat.com>,
        syzbot <syzbot+d3402c47f680ff24b29c@...kaller.appspotmail.com>,
        syzkaller-bugs <syzkaller-bugs@...glegroups.com>,
        David Miller <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Networking <netdev@...r.kernel.org>,
        sunlianwen <sunlw.fnst@...fujitsu.com>
Subject: Re: KMSAN: uninit-value in memcmp (2)

On Mon, Sep 24, 2018 at 8:53 AM, Alexander Potapenko <glider@...gle.com> wrote:
> On Mon, Sep 24, 2018 at 12:09 AM Vladis Dronov <vdronov@...hat.com> wrote:
>>
>> Hello, Dmirty,
>>
>> Thank you for the reply. Can we please, discuss this further?
> Hi Vladis,
>> > You can see on dashboard that the last crash
>> > for the second version (2) happened just few days ago. So this is a
>> > different bug.
> FWIW I've just double-checked that the reproducer provided by
> syzkaller in the original message still triggers the report from the
> original message in the latest KMSAN tree (which already contains the
> __hw_addr_add_ex() fix from April).
>> Well... yes and no. When I was looking at this bug (bug?id=088efeac32fd) I was looking
>> at the report at "2018/05/09 18:55" (https://syzkaller.appspot.com/text?tag=CrashReport&x=141b707b800000),
>> since it was the only report with a reproducer. This was my error.
>>
>> The error and the call trace in this report are:
>>
>> >>>
>> BUG: KMSAN: uninit-value in memcmp+0x119/0x180 lib/string.c:861
>> CPU: 0 PID: 38 Comm: kworker/0:1 Not tainted 4.17.0-rc3+ #88
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
>> Workqueue: ipv6_addrconf addrconf_dad_work
>> Call Trace:
>>  __dump_stack lib/dump_stack.c:77 [inline]
>>  dump_stack+0x185/0x1d0 lib/dump_stack.c:113
>>  kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
>>  __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:683
>>  memcmp+0x119/0x180 lib/string.c:861
>>  __hw_addr_add_ex net/core/dev_addr_lists.c:61 [inline]
>>  __dev_mc_add+0x1fc/0x900 net/core/dev_addr_lists.c:670
>>  dev_mc_add+0x6d/0x80 net/core/dev_addr_lists.c:687
>>  igmp6_group_added+0x2db/0xa00 net/ipv6/mcast.c:662
>>  ipv6_dev_mc_inc+0xe9e/0x1130 net/ipv6/mcast.c:914
>>  addrconf_join_solict net/ipv6/addrconf.c:2103 [inline]
>>  addrconf_dad_begin net/ipv6/addrconf.c:3853 [inline]
>>  addrconf_dad_work+0x462/0x2a20 net/ipv6/addrconf.c:3979
>>  process_one_work+0x12c6/0x1f60 kernel/workqueue.c:2145
>>  worker_thread+0x113c/0x24f0 kernel/workqueue.c:2279
>>  kthread+0x539/0x720 kernel/kthread.c:239
>>  ret_from_fork+0x35/0x40 arch/x86/entry/entry_64.S:412
>>
>> Local variable description: ----buf@...p6_group_added
>> Variable was created at:
>>  igmp6_group_added+0x4a/0xa00 net/ipv6/mcast.c:650
>>  ipv6_dev_mc_inc+0xe9e/0x1130 net/ipv6/mcast.c:914
>> <<<
>>
>> It is the same like in bug?id=3887c0d99aecb27d085180c5222d245d08a30806
>> which, after some more test, made me believe these bugs are duplicate
>> and are fixed by the same commit.
>>
>> But let's look at another report at "2018/09/12 21:00"
>> (https://syzkaller.appspot.com/text?tag=CrashReport&x=14f99b71400000)
>> at the bug (bug?id=088efeac32fd), the one you've mentioned as
>> "the last crash for the second version (2) happened just few days ago".
>>
>> Its error and the call trace are completely different:
>>
>> >>>
>> BUG: KMSAN: uninit-value in memcmp+0x11d/0x180 lib/string.c:863
>> CPU: 0 PID: 6107 Comm: syz-executor4 Not tainted 4.19.0-rc3+ #45
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
>> Call Trace:
>>  __dump_stack lib/dump_stack.c:77 [inline]
>>  dump_stack+0x14b/0x190 lib/dump_stack.c:113
>>  kmsan_report+0x183/0x2b0 mm/kmsan/kmsan.c:956
>>  __msan_warning+0x70/0xc0 mm/kmsan/kmsan_instr.c:645
>>  memcmp+0x11d/0x180 lib/string.c:863
>>  dev_uc_add_excl+0x165/0x7b0 net/core/dev_addr_lists.c:464
>>  ndo_dflt_fdb_add net/core/rtnetlink.c:3463 [inline]
>>  rtnl_fdb_add+0x1081/0x1270 net/core/rtnetlink.c:3558
>>  rtnetlink_rcv_msg+0xa0b/0x1530 net/core/rtnetlink.c:4715
>>  netlink_rcv_skb+0x36e/0x5f0 net/netlink/af_netlink.c:2454
>>  rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4733
>>  netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
>>  netlink_unicast+0x1638/0x1720 net/netlink/af_netlink.c:1343
>>  netlink_sendmsg+0x1205/0x1290 net/netlink/af_netlink.c:1908
>>  sock_sendmsg_nosec net/socket.c:621 [inline]
>>  sock_sendmsg net/socket.c:631 [inline]
>> ...
>> Uninit was created at:
>> ...
>>  slab_post_alloc_hook mm/slab.h:446 [inline]
>>  slab_alloc_node mm/slub.c:2718 [inline]
>>  __kmalloc_node_track_caller+0x9e7/0x1160 mm/slub.c:4351
>>  __kmalloc_reserve net/core/skbuff.c:138 [inline]
>>  __alloc_skb+0x2f5/0x9e0 net/core/skbuff.c:206
>>  alloc_skb include/linux/skbuff.h:996 [inline]
>>  netlink_alloc_large_skb net/netlink/af_netlink.c:1189 [inline]
>>  netlink_sendmsg+0xb49/0x1290 net/netlink/af_netlink.c:1883
>>  sock_sendmsg_nosec net/socket.c:621 [inline]
>>  sock_sendmsg net/socket.c:631 [inline]
>>  ___sys_sendmsg+0xe70/0x1290 net/socket.c:2114
>> <<<
>>
>> This is a different bug. How come these 2 different reports for 2 different
>> bugs have ended in the same syzkaller report (bug?id=088efeac32fd) ?
>
> I suspect this is because syzbot used the top stack frame as the
> report signature.
> There's a mechanism to ignore frames like memcmp() in the reports, not
> sure why didn't it work in this case (maybe it just wasn't in place at
> the time the report happened).
>> One bug is fixed by the "net: fix uninit-value in __hw_addr_add_ex()" commit,
>> the second one is not, but they are still in the same syzkaller report.
>>
>> This was the reason of my confusion. I'm not sure how to fix this. If it is possible,
>> probably we need to cancel/revoke "#syz fix: net: fix uninit-value in __hw_addr_add_ex()"
>> for this syzkaller report (bug?id=088efeac32fd). And then "split" it into 2 or
>> more different reports, but I'm not sure if this is possible.
>>
>> Probably, syzkaller needs to look deeper into the KMSAN reports to differentiate
>> KMSAN errors happening because of different reasons.
>>
>> > On Sun, Sep 23, 2018 at 6:02 PM, Vladis Dronov <vdronov@...hat.com> wrote:
>> > > #syz fix: net: fix uninit-value in __hw_addr_add_ex()
>> >
>> > Hi Vladis,
>> >
>> > This can be fixed with "net: fix uninit-value in __hw_addr_add_ex()".
>> > That commit landed in April, syzbot waited till the commit reached all
>> > tested trees, and then closed the bug.
>> > But the similar bug continued to happen, so syzbot created second
>> > version of this bug (2). You can see on dashboard that the last crash
>> > for the second version (2) happened just few days ago. So this is a
>> > different bug.


Precisely discriminating bugs (root causes) bases on crash text is
generally undecidable problem, even for humans. We even can have
literally equal crash texts, which are still different bugs. And we
can have significantly differently looking crash texts, which are
actually caused by the same root cause. syzbot extracts some
"identity" string for each crash and than uses that string to
discriminate crashes and sort them into bins. This identity string is
what you see in email subject and bug title on dashboard. This method
can have both false positives and false negatives, but works
reasonably well in most cases and looks like the best practical
option.

For this exact instance (memcmp) we actually improved the analysis
logic recently:
https://github.com/google/syzkaller/commit/0e29942f77715486995d996f80f82742812d75a2#diff-abe1515f011fad2659ff218f9eea9ae1
But this crash was analyzed and reported before the change. So if this
crash happens again it should be reported as "in __hw_addr_add_ex"
now.

Re __hw_addr_add_ex bug, as Alex noted the crash was detected _after_
the fixing commit went in. So it's something new and different and
can't be fixed by the older commit.

There are no general, single guideline as to what to do when several
different bugs glued together into a single bug. Fixing at least one
of them (any) in the context of the bug is good, fixing both is good
too. When/if a bug is closed, new occurrences of similar crashes (the
same identity string) will lead to creation of a new bug. So if we fix
only one and close the bug, eventually the second one will lead to a
new bug (won't be lost), now dedicated to this second crash.

Now syzbot thinks that this bug is fixed/closed:
https://syzkaller.appspot.com/bug?extid=d3402c47f680ff24b29c
There is specifically no "undo" functionality, because it's inherently
racy with creation of a new version of this bugs by new crashes. So if
of these crashes will happen again, syzbot will open new bugs (now
with better discriminated titles). We can wait for that. Or we can
submit new fixes without waiting for new syzbot bugs (adding
Reported-by to new commits referencing this bug should not do any
harm).

Hope this clarifies things a bit.

Thanks