lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <874kwg2cka.fsf@unikie.com>
Date:   Tue, 28 Jan 2020 10:46:29 +0200
From:   jouni.hogander@...kie.com (Jouni Högander)
To:     Lukas Bulwahn <lukas.bulwahn@...il.com>
Cc:     Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        open list <linux-kernel@...r.kernel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Ben Hutchings <ben.hutchings@...ethink.co.uk>,
        linux- stable <stable@...r.kernel.org>,
        Netdev <netdev@...r.kernel.org>,
        Al Viro <viro@...iv.linux.org.uk>,
        linux-fsdevel@...r.kernel.org, Eric Dumazet <edumazet@...gle.com>,
        "David S. Miller" <davem@...emloft.net>, syzkaller@...glegroups.com
Subject: Re: [PATCH 4.19 000/306] 4.19.87-stable review

Lukas Bulwahn <lukas.bulwahn@...il.com> writes:

> On Mon, 27 Jan 2020, Jouni Högander wrote:
>
>> Lukas Bulwahn <lukas.bulwahn@...il.com> writes:
>> 
>> > On Wed, 22 Jan 2020, Jouni Högander wrote:
>> >
>> >> Greg Kroah-Hartman <gregkh@...uxfoundation.org> writes:
>> >> >> > Now queued up, I'll push out -rc2 versions with this fix.
>> >> >> >
>> >> >> > greg k-h
>> >> >> 
>> >> >> We have also been informed about another regression these two commits
>> >> >> are causing:
>> >> >> 
>> >> >> https://lore.kernel.org/lkml/ace19af4-7cae-babd-bac5-cd3505dcd874@I-love.SAKURA.ne.jp/
>> >> >> 
>> >> >> I suggest to drop these two patches from this queue, and give us a
>> >> >> week to shake out the regressions of the change, and once ready, we
>> >> >> can include the complete set of fixes to stable (probably in a week or
>> >> >> two).
>> >> >
>> >> > Ok, thanks for the information, I've now dropped them from all of the
>> >> > queues that had them in them.
>> >> >
>> >> > greg k-h
>> >> 
>> >> I have now run more extensive Syzkaller testing on following patches:
>> >> 
>> >> cb626bf566eb net-sysfs: Fix reference count leak
>> >> ddd9b5e3e765 net-sysfs: Call dev_hold always in rx_queue_add_kobject
>> >> e0b60903b434 net-sysfs: Call dev_hold always in netdev_queue_add_kobje
>> >> 48a322b6f996 net-sysfs: fix netdev_queue_add_kobject() breakage
>> >> b8eb718348b8 net-sysfs: Fix reference count leak in rx|netdev_queue_add_kobject
>> >> 
>> >> These patches are fixing couple of memory leaks including this one found
>> >> by Syzbot: https://syzkaller.appspot.com/bug?extid=ad8ca40ecd77896d51e2
>> >> 
>> >> I can reproduce these memory leaks in following stable branches: 4.14,
>> >> 4.19, and 5.4.
>> >> 
>> >> These are all now merged into net/master tree and based on my testing
>> >> they are ready to be taken into stable branches as well.
>> >>
>> >
>> > + syzkaller list
>> > Jouni et. al, please drop Linus in further responses; Linus, it was wrong 
>> > to add you to this thread in the first place (reason is explained below)
>> >
>> > Jouni, thanks for investigating.
>> >
>> > It raises the following questions and comments:
>> >
>> > - Does the memory leak NOT appear on 4.9 and earlier LTS branches (or did 
>> > you not check that)? If it does not appear, can you bisect it with the 
>> > reproducer to the commit between 4.14 and 4.9?
>> 
>> I tested and these memory leaks are not reproucible in 4.9 and earlier.
>> 
>> >
>> > - Do the reproducers you found with your syzkaller testing show the same 
>> > behaviour (same bisection) as the reproducers from syzbot?
>> 
>> Yes, they are same.
>> 
>> >
>> > - I fear syzbot's automatic bisection on is wrong, and Linus' commit 
>> > 0e034f5c4bc4 ("iwlwifi: fix mis-merge that breaks the driver") is not to 
>> > blame here; that commit did not cause the memory leak, but fixed some 
>> > unrelated issue that simply confuses syzbot's automatic bisection.
>> >
>> > Just FYI: Dmitry Vyukov's evaluation of the syzbot bisection shows that 
>> > about 50% are wrong, e.g., due to multiple bugs being triggered with one 
>> > reproducer and the difficulty of automatically identifying them of being 
>> > different due to different root causes (despite the smart heuristics of 
>> > syzkaller & syzbot). So, to identify the actual commit on which the memory 
>> > leak first appeared, you need to bisect manually with your own judgement 
>> > if the reported bug stack trace fits to the issue you investigating. Or 
>> > you use syzbot's automatic bisection but then with a reduced kernel config 
>> > that cannot be confused by other issues. You might possibly also hit a 
>> > "beginning of time" in your bisection, where KASAN was simply not 
>> > supported, then the initially causing commit can simply not determined by 
>> > bisection with the reproducer and needs some code inspection and 
>> > archaeology with git. Can you go ahead try to identify the correct commit 
>> > for this issue?
>> 
>> These two commits (that are not in 4.9 and earlier) are intorducing these leaks:
>> 
>> commit e331c9066901dfe40bea4647521b86e9fb9901bb
>> Author: YueHaibing <yuehaibing@...wei.com>
>> Date:   Tue Mar 19 10:16:53 2019 +0800
>> 
>>     net-sysfs: call dev_hold if kobject_init_and_add success
>>     
>>     [ Upstream commit a3e23f719f5c4a38ffb3d30c8d7632a4ed8ccd9e ]
>>     
>>     In netdev_queue_add_kobject and rx_queue_add_kobject,
>>     if sysfs_create_group failed, kobject_put will call
>>     netdev_queue_release to decrease dev refcont, however
>>     dev_hold has not be called. So we will see this while
>>     unregistering dev:
>>     
>>     unregister_netdevice: waiting for bcsh0 to become free. Usage count = -1
>>     
>>     Reported-by: Hulk Robot <hulkci@...wei.com>
>>     Fixes: d0d668371679 ("net: don't decrement kobj reference count on init fail
>> ure")
>>     Signed-off-by: YueHaibing <yuehaibing@...wei.com>
>>     Signed-off-by: David S. Miller <davem@...emloft.net>
>>     Signed-off-by: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
>> 
>> commit d0d6683716791b2a2761a1bb025c613eb73da6c3
>> Author: stephen hemminger <stephen@...workplumber.org>
>> Date:   Fri Aug 18 13:46:19 2017 -0700
>> 
>>     net: don't decrement kobj reference count on init failure
>>     
>>     If kobject_init_and_add failed, then the failure path would
>>     decrement the reference count of the queue kobject whose reference
>>     count was already zero.
>>     
>>     Fixes: 114cf5802165 ("bql: Byte queue limits")
>>     Signed-off-by: Stephen Hemminger <sthemmin@...rosoft.com>
>>     Signed-off-by: David S. Miller <davem@...emloft.net>
>> 
>
> But, it seems that we now have just a long sequences of fix patches.
>
> This commit from 2011 seems to be the initial buggy one:
>
> commit 114cf5802165ee93e3ab461c9c505cd94a08b800
> Author: Tom Herbert <therbert@...gle.com>
> Date:   Mon Nov 28 16:33:09 2011 +0000
>
>     bql: Byte queue limits
>
> And then we just have fixes over fixes:
>
> 114cf5802165ee93e3ab461c9c505cd94a08b800
> fixed by d0d6683716791b2a2761a1bb025c613eb73da6c3
> fixed by a3e23f719f5c4a38ffb3d30c8d7632a4ed8ccd9e
> fixed by the sequence of your five patches, mentioned above
>
>
> If that is right, we should be able to find a reproducer with syzkaller on 
> the versions before d0d668371679 ("net: don't decrement kobj reference 
> count on init failure") with fault injection enabled or some manually 
> injected fault by modifying the source code to always fail on init to 
> really trigger the init failure, and see the reference count go below 
> zero.
>
> All further issues should also have reproducers found with syzkaller.
> If we have a good feeling on the reproducers and this series of fixes 
> really fixed the issue now here for all cases, we should suggest to 
> backport all of the fixes to 4.4 and 4.9.
>
> We should NOT just have Greg pick up a subset of the patches and backport 
> them to 4.4 and 4.9, that will likely break more than it fixes.

Yes, this is the case.

>
> Jouni, did you see Greg's bot inform you that he would pick up your latest 
> patch for 4.4 and 4.9? Please respond to those emails to make sure a 
> complete set of patches is picked up, which we tested with all those 
> intermediate reproducers and an extensive syzkaller run hitting the 
> net-sysfs interface (e.g., by configuring the corpus and check
> coverage).

I already responded to not pick these patches into 4.4 and 4.9. 

>
> If you cannot do this testing for 4.4 and 4.9 now quickly (you 
> potentially have less than 24 hours), we should hold those new patches 
> back for 4.4 and 4.9, as none of the fixes seem to be applied at all right 
> now and the users have not complained yet on 4.4 and 4.9.
> Once testing of the whole fix sequence is done, we request to backport all 
> patches at once for 4.4 and 4.9.

If we want to pick whole set including older patches I think I need more
time for identifying which older patches (apart from these two I
identified causing the memory leak) should be taken in and for testing.

>
> Lukas

BR,

Jouni Högander

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ