lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 19 Nov 2019 08:58:12 +0100
From:   Greg Kroah-Hartman <gregkh@...uxfoundation.org>
To:     Jouni Högander <jouni.hogander@...kie.com>
Cc:     netdev@...r.kernel.org, David Miller <davem@...emloft.net>,
        Lukas Bulwahn <lukas.bulwahn@...il.com>
Subject: Re: [PATCH v3] net-sysfs: Fix reference count leak

On Tue, Nov 19, 2019 at 09:45:02AM +0200, Jouni Högander wrote:
> Greg Kroah-Hartman <gregkh@...uxfoundation.org> writes:
> 
> > On Mon, Nov 18, 2019 at 02:32:46PM +0200, jouni.hogander@...kie.com wrote:
> >> From: Jouni Hogander <jouni.hogander@...kie.com>
> >> 
> >> Netdev_register_kobject is calling device_initialize. In case of error
> >> reference taken by device_initialize is not given up.
> >> 
> >> Drivers are supposed to call free_netdev in case of error. In non-error
> >> case the last reference is given up there and device release sequence
> >> is triggered. In error case this reference is kept and the release
> >> sequence is never started.
> >> 
> >> Fix this reference count leak by allowing giving up the reference also
> >> in error case in free_netdev.
> >> 
> >> Also replace BUG_ON with WARN_ON in free_netdev and in netdev_release.
> >> 
> >> This is the rootcause for couple of memory leaks reported by Syzkaller:
> >> 
> >> BUG: memory leak unreferenced object 0xffff8880675ca008 (size 256):
> >>   comm "netdev_register", pid 281, jiffies 4294696663 (age 6.808s)
> >>   hex dump (first 32 bytes):
> >>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> >>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> >>   backtrace:
> >>     [<0000000058ca4711>] kmem_cache_alloc_trace+0x167/0x280
> >>     [<000000002340019b>] device_add+0x882/0x1750
> >>     [<000000001d588c3a>] netdev_register_kobject+0x128/0x380
> >>     [<0000000011ef5535>] register_netdevice+0xa1b/0xf00
> >>     [<000000007fcf1c99>] __tun_chr_ioctl+0x20d5/0x3dd0
> >>     [<000000006a5b7b2b>] tun_chr_ioctl+0x2f/0x40
> >>     [<00000000f30f834a>] do_vfs_ioctl+0x1c7/0x1510
> >>     [<00000000fba062ea>] ksys_ioctl+0x99/0xb0
> >>     [<00000000b1c1b8d2>] __x64_sys_ioctl+0x78/0xb0
> >>     [<00000000984cabb9>] do_syscall_64+0x16f/0x580
> >>     [<000000000bde033d>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >>     [<00000000e6ca2d9f>] 0xffffffffffffffff
> >> 
> >> BUG: memory leak
> >> unreferenced object 0xffff8880668ba588 (size 8):
> >>   comm "kobject_set_nam", pid 286, jiffies 4294725297 (age 9.871s)
> >>   hex dump (first 8 bytes):
> >>     6e 72 30 00 cc be df 2b                          nr0....+
> >>   backtrace:
> >>     [<00000000a322332a>] __kmalloc_track_caller+0x16e/0x290
> >>     [<00000000236fd26b>] kstrdup+0x3e/0x70
> >>     [<00000000dd4a2815>] kstrdup_const+0x3e/0x50
> >>     [<0000000049a377fc>] kvasprintf_const+0x10e/0x160
> >>     [<00000000627fc711>] kobject_set_name_vargs+0x5b/0x140
> >>     [<0000000019eeab06>] dev_set_name+0xc0/0xf0
> >>     [<0000000069cb12bc>] netdev_register_kobject+0xc8/0x320
> >>     [<00000000f2e83732>] register_netdevice+0xa1b/0xf00
> >>     [<000000009e1f57cc>] __tun_chr_ioctl+0x20d5/0x3dd0
> >>     [<000000009c560784>] tun_chr_ioctl+0x2f/0x40
> >>     [<000000000d759e02>] do_vfs_ioctl+0x1c7/0x1510
> >>     [<00000000351d7c31>] ksys_ioctl+0x99/0xb0
> >>     [<000000008390040a>] __x64_sys_ioctl+0x78/0xb0
> >>     [<0000000052d196b7>] do_syscall_64+0x16f/0x580
> >>     [<0000000019af9236>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >>     [<00000000bc384531>] 0xffffffffffffffff
> >> 
> >> Reported-by: syzbot+ad8ca40ecd77896d51e2@...kaller.appspotmail.com
> >> Cc: David Miller <davem@...emloft.net>
> >> Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
> >> Cc: Lukas Bulwahn <lukas.bulwahn@...il.com>
> >> Signed-off-by: Jouni Hogander <jouni.hogander@...kie.com>
> >> ---
> >> v2 -> v3:
> >> * Replaced BUG_ON with WARN_ON in free_netdev and netdev_release
> >> v1 -> v2:
> >> * Relying on driver calling free_netdev rather than calling
> >>   put_device directly in error path
> >> ---
> >>  net/core/dev.c       | 14 +++++++-------
> >>  net/core/net-sysfs.c |  6 +++++-
> >>  2 files changed, 12 insertions(+), 8 deletions(-)
> >> 
> >> diff --git a/net/core/dev.c b/net/core/dev.c
> >> index 99ac84ff398f..1d6c0bfb5ec5 100644
> >> --- a/net/core/dev.c
> >> +++ b/net/core/dev.c
> >> @@ -9603,14 +9603,14 @@ void free_netdev(struct net_device *dev)
> >>  
> >>  	netdev_unregister_lockdep_key(dev);
> >>  
> >> -	/*  Compatibility with error handling in drivers */
> >> -	if (dev->reg_state == NETREG_UNINITIALIZED) {
> >> -		netdev_freemem(dev);
> >> -		return;
> >> -	}
> >> +	/* reg_state is NETREG_UNINITIALIZED if there is an error in
> >> +	 * registration.
> >> +	 */
> >> +	WARN_ON(dev->reg_state != NETREG_UNREGISTERED &&
> >> +		dev->reg_state != NETREG_UNINITIALIZED);
> >
> > You "warn" about this, but do not actually handle the problem.  So what
> > is this helping with?
> >
> 
> Now as I have replaced BUG_ON with WARN_ON different reg_states are
> actually handled. In free_netdev reference is given up and release
> sequence is expected to start no matter what is the dev->reg_state. In
> netdev_release memories are freed. There are just extra warnings in the
> log from the functions.
> 
> > Systems with panic-on-warn just rebooted, and a "normal" system saw a
> > traceback yet no error handling happened so why would the code even test
> > this?
> 
> One use for panic-on-warn is the stabilation phase before "product"
> quality. In this case you are interested in everything that is
> unexpected. Having panic-on-warn set is easy way to make each warning in
> the system visible to user and force to react on it.
> 
> Another example is my Syzkaller excercise. I gathered all the crashes
> from the system and these include warnings as well. I will probably next
> look at these warnings Syzkaller found and try to understand why they
> are and do they need fixes.

I know many "production" systems that run panic-on-warn so that if
something "odd" happens, the system gets rebooted and restarted all
automatically.  It makes sense in a large cluster/cloud system.

> > I'm not trying to be picky here, just to think about what you are doing
> > with these checks please.
> 
> This state was not expected being anything else than
> NETREG_UNREGISTERED. Probably error cases were not taken into account or
> maybe error cases were not expected to end up to device release
> sequence. Especially as cleaning up relies on normal device release
> sequence on error cases this assumption is wrong.
> 
> Do you think better choise would be to remove BUG_ON completely rather
> than replacing it with WARN_ON?

I don't know this code, and I'm not the maintainer of it, so I don't
have a strong opinion.  Other than if this was code I was maintaining, I
would not want it there :)

Anyway, your original fix looks sane to me, and that's probably a good
thing to just do now and then you can work on these other things.

thanks,

greg k-h

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ