[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20191119075812.GB1858193@kroah.com>
Date: Tue, 19 Nov 2019 08:58:12 +0100
From: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
To: Jouni Högander <jouni.hogander@...kie.com>
Cc: netdev@...r.kernel.org, David Miller <davem@...emloft.net>,
Lukas Bulwahn <lukas.bulwahn@...il.com>
Subject: Re: [PATCH v3] net-sysfs: Fix reference count leak
On Tue, Nov 19, 2019 at 09:45:02AM +0200, Jouni Högander wrote:
> Greg Kroah-Hartman <gregkh@...uxfoundation.org> writes:
>
> > On Mon, Nov 18, 2019 at 02:32:46PM +0200, jouni.hogander@...kie.com wrote:
> >> From: Jouni Hogander <jouni.hogander@...kie.com>
> >>
> >> Netdev_register_kobject is calling device_initialize. In case of error
> >> reference taken by device_initialize is not given up.
> >>
> >> Drivers are supposed to call free_netdev in case of error. In non-error
> >> case the last reference is given up there and device release sequence
> >> is triggered. In error case this reference is kept and the release
> >> sequence is never started.
> >>
> >> Fix this reference count leak by allowing giving up the reference also
> >> in error case in free_netdev.
> >>
> >> Also replace BUG_ON with WARN_ON in free_netdev and in netdev_release.
> >>
> >> This is the rootcause for couple of memory leaks reported by Syzkaller:
> >>
> >> BUG: memory leak unreferenced object 0xffff8880675ca008 (size 256):
> >> comm "netdev_register", pid 281, jiffies 4294696663 (age 6.808s)
> >> hex dump (first 32 bytes):
> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> >> backtrace:
> >> [<0000000058ca4711>] kmem_cache_alloc_trace+0x167/0x280
> >> [<000000002340019b>] device_add+0x882/0x1750
> >> [<000000001d588c3a>] netdev_register_kobject+0x128/0x380
> >> [<0000000011ef5535>] register_netdevice+0xa1b/0xf00
> >> [<000000007fcf1c99>] __tun_chr_ioctl+0x20d5/0x3dd0
> >> [<000000006a5b7b2b>] tun_chr_ioctl+0x2f/0x40
> >> [<00000000f30f834a>] do_vfs_ioctl+0x1c7/0x1510
> >> [<00000000fba062ea>] ksys_ioctl+0x99/0xb0
> >> [<00000000b1c1b8d2>] __x64_sys_ioctl+0x78/0xb0
> >> [<00000000984cabb9>] do_syscall_64+0x16f/0x580
> >> [<000000000bde033d>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >> [<00000000e6ca2d9f>] 0xffffffffffffffff
> >>
> >> BUG: memory leak
> >> unreferenced object 0xffff8880668ba588 (size 8):
> >> comm "kobject_set_nam", pid 286, jiffies 4294725297 (age 9.871s)
> >> hex dump (first 8 bytes):
> >> 6e 72 30 00 cc be df 2b nr0....+
> >> backtrace:
> >> [<00000000a322332a>] __kmalloc_track_caller+0x16e/0x290
> >> [<00000000236fd26b>] kstrdup+0x3e/0x70
> >> [<00000000dd4a2815>] kstrdup_const+0x3e/0x50
> >> [<0000000049a377fc>] kvasprintf_const+0x10e/0x160
> >> [<00000000627fc711>] kobject_set_name_vargs+0x5b/0x140
> >> [<0000000019eeab06>] dev_set_name+0xc0/0xf0
> >> [<0000000069cb12bc>] netdev_register_kobject+0xc8/0x320
> >> [<00000000f2e83732>] register_netdevice+0xa1b/0xf00
> >> [<000000009e1f57cc>] __tun_chr_ioctl+0x20d5/0x3dd0
> >> [<000000009c560784>] tun_chr_ioctl+0x2f/0x40
> >> [<000000000d759e02>] do_vfs_ioctl+0x1c7/0x1510
> >> [<00000000351d7c31>] ksys_ioctl+0x99/0xb0
> >> [<000000008390040a>] __x64_sys_ioctl+0x78/0xb0
> >> [<0000000052d196b7>] do_syscall_64+0x16f/0x580
> >> [<0000000019af9236>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >> [<00000000bc384531>] 0xffffffffffffffff
> >>
> >> Reported-by: syzbot+ad8ca40ecd77896d51e2@...kaller.appspotmail.com
> >> Cc: David Miller <davem@...emloft.net>
> >> Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
> >> Cc: Lukas Bulwahn <lukas.bulwahn@...il.com>
> >> Signed-off-by: Jouni Hogander <jouni.hogander@...kie.com>
> >> ---
> >> v2 -> v3:
> >> * Replaced BUG_ON with WARN_ON in free_netdev and netdev_release
> >> v1 -> v2:
> >> * Relying on driver calling free_netdev rather than calling
> >> put_device directly in error path
> >> ---
> >> net/core/dev.c | 14 +++++++-------
> >> net/core/net-sysfs.c | 6 +++++-
> >> 2 files changed, 12 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/net/core/dev.c b/net/core/dev.c
> >> index 99ac84ff398f..1d6c0bfb5ec5 100644
> >> --- a/net/core/dev.c
> >> +++ b/net/core/dev.c
> >> @@ -9603,14 +9603,14 @@ void free_netdev(struct net_device *dev)
> >>
> >> netdev_unregister_lockdep_key(dev);
> >>
> >> - /* Compatibility with error handling in drivers */
> >> - if (dev->reg_state == NETREG_UNINITIALIZED) {
> >> - netdev_freemem(dev);
> >> - return;
> >> - }
> >> + /* reg_state is NETREG_UNINITIALIZED if there is an error in
> >> + * registration.
> >> + */
> >> + WARN_ON(dev->reg_state != NETREG_UNREGISTERED &&
> >> + dev->reg_state != NETREG_UNINITIALIZED);
> >
> > You "warn" about this, but do not actually handle the problem. So what
> > is this helping with?
> >
>
> Now as I have replaced BUG_ON with WARN_ON different reg_states are
> actually handled. In free_netdev reference is given up and release
> sequence is expected to start no matter what is the dev->reg_state. In
> netdev_release memories are freed. There are just extra warnings in the
> log from the functions.
>
> > Systems with panic-on-warn just rebooted, and a "normal" system saw a
> > traceback yet no error handling happened so why would the code even test
> > this?
>
> One use for panic-on-warn is the stabilation phase before "product"
> quality. In this case you are interested in everything that is
> unexpected. Having panic-on-warn set is easy way to make each warning in
> the system visible to user and force to react on it.
>
> Another example is my Syzkaller excercise. I gathered all the crashes
> from the system and these include warnings as well. I will probably next
> look at these warnings Syzkaller found and try to understand why they
> are and do they need fixes.
I know many "production" systems that run panic-on-warn so that if
something "odd" happens, the system gets rebooted and restarted all
automatically. It makes sense in a large cluster/cloud system.
> > I'm not trying to be picky here, just to think about what you are doing
> > with these checks please.
>
> This state was not expected being anything else than
> NETREG_UNREGISTERED. Probably error cases were not taken into account or
> maybe error cases were not expected to end up to device release
> sequence. Especially as cleaning up relies on normal device release
> sequence on error cases this assumption is wrong.
>
> Do you think better choise would be to remove BUG_ON completely rather
> than replacing it with WARN_ON?
I don't know this code, and I'm not the maintainer of it, so I don't
have a strong opinion. Other than if this was code I was maintaining, I
would not want it there :)
Anyway, your original fix looks sane to me, and that's probably a good
thing to just do now and then you can work on these other things.
thanks,
greg k-h
Powered by blists - more mailing lists