lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 19 Nov 2019 09:45:02 +0200
From:   jouni.hogander@...kie.com (Jouni Högander)
To:     Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Cc:     netdev@...r.kernel.org, David Miller <davem@...emloft.net>,
        Lukas Bulwahn <lukas.bulwahn@...il.com>
Subject: Re: [PATCH v3] net-sysfs: Fix reference count leak

Greg Kroah-Hartman <gregkh@...uxfoundation.org> writes:

> On Mon, Nov 18, 2019 at 02:32:46PM +0200, jouni.hogander@...kie.com wrote:
>> From: Jouni Hogander <jouni.hogander@...kie.com>
>> 
>> Netdev_register_kobject is calling device_initialize. In case of error
>> reference taken by device_initialize is not given up.
>> 
>> Drivers are supposed to call free_netdev in case of error. In non-error
>> case the last reference is given up there and device release sequence
>> is triggered. In error case this reference is kept and the release
>> sequence is never started.
>> 
>> Fix this reference count leak by allowing giving up the reference also
>> in error case in free_netdev.
>> 
>> Also replace BUG_ON with WARN_ON in free_netdev and in netdev_release.
>> 
>> This is the rootcause for couple of memory leaks reported by Syzkaller:
>> 
>> BUG: memory leak unreferenced object 0xffff8880675ca008 (size 256):
>>   comm "netdev_register", pid 281, jiffies 4294696663 (age 6.808s)
>>   hex dump (first 32 bytes):
>>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
>>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
>>   backtrace:
>>     [<0000000058ca4711>] kmem_cache_alloc_trace+0x167/0x280
>>     [<000000002340019b>] device_add+0x882/0x1750
>>     [<000000001d588c3a>] netdev_register_kobject+0x128/0x380
>>     [<0000000011ef5535>] register_netdevice+0xa1b/0xf00
>>     [<000000007fcf1c99>] __tun_chr_ioctl+0x20d5/0x3dd0
>>     [<000000006a5b7b2b>] tun_chr_ioctl+0x2f/0x40
>>     [<00000000f30f834a>] do_vfs_ioctl+0x1c7/0x1510
>>     [<00000000fba062ea>] ksys_ioctl+0x99/0xb0
>>     [<00000000b1c1b8d2>] __x64_sys_ioctl+0x78/0xb0
>>     [<00000000984cabb9>] do_syscall_64+0x16f/0x580
>>     [<000000000bde033d>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>     [<00000000e6ca2d9f>] 0xffffffffffffffff
>> 
>> BUG: memory leak
>> unreferenced object 0xffff8880668ba588 (size 8):
>>   comm "kobject_set_nam", pid 286, jiffies 4294725297 (age 9.871s)
>>   hex dump (first 8 bytes):
>>     6e 72 30 00 cc be df 2b                          nr0....+
>>   backtrace:
>>     [<00000000a322332a>] __kmalloc_track_caller+0x16e/0x290
>>     [<00000000236fd26b>] kstrdup+0x3e/0x70
>>     [<00000000dd4a2815>] kstrdup_const+0x3e/0x50
>>     [<0000000049a377fc>] kvasprintf_const+0x10e/0x160
>>     [<00000000627fc711>] kobject_set_name_vargs+0x5b/0x140
>>     [<0000000019eeab06>] dev_set_name+0xc0/0xf0
>>     [<0000000069cb12bc>] netdev_register_kobject+0xc8/0x320
>>     [<00000000f2e83732>] register_netdevice+0xa1b/0xf00
>>     [<000000009e1f57cc>] __tun_chr_ioctl+0x20d5/0x3dd0
>>     [<000000009c560784>] tun_chr_ioctl+0x2f/0x40
>>     [<000000000d759e02>] do_vfs_ioctl+0x1c7/0x1510
>>     [<00000000351d7c31>] ksys_ioctl+0x99/0xb0
>>     [<000000008390040a>] __x64_sys_ioctl+0x78/0xb0
>>     [<0000000052d196b7>] do_syscall_64+0x16f/0x580
>>     [<0000000019af9236>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>     [<00000000bc384531>] 0xffffffffffffffff
>> 
>> Reported-by: syzbot+ad8ca40ecd77896d51e2@...kaller.appspotmail.com
>> Cc: David Miller <davem@...emloft.net>
>> Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
>> Cc: Lukas Bulwahn <lukas.bulwahn@...il.com>
>> Signed-off-by: Jouni Hogander <jouni.hogander@...kie.com>
>> ---
>> v2 -> v3:
>> * Replaced BUG_ON with WARN_ON in free_netdev and netdev_release
>> v1 -> v2:
>> * Relying on driver calling free_netdev rather than calling
>>   put_device directly in error path
>> ---
>>  net/core/dev.c       | 14 +++++++-------
>>  net/core/net-sysfs.c |  6 +++++-
>>  2 files changed, 12 insertions(+), 8 deletions(-)
>> 
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index 99ac84ff398f..1d6c0bfb5ec5 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -9603,14 +9603,14 @@ void free_netdev(struct net_device *dev)
>>  
>>  	netdev_unregister_lockdep_key(dev);
>>  
>> -	/*  Compatibility with error handling in drivers */
>> -	if (dev->reg_state == NETREG_UNINITIALIZED) {
>> -		netdev_freemem(dev);
>> -		return;
>> -	}
>> +	/* reg_state is NETREG_UNINITIALIZED if there is an error in
>> +	 * registration.
>> +	 */
>> +	WARN_ON(dev->reg_state != NETREG_UNREGISTERED &&
>> +		dev->reg_state != NETREG_UNINITIALIZED);
>
> You "warn" about this, but do not actually handle the problem.  So what
> is this helping with?
>

Now as I have replaced BUG_ON with WARN_ON different reg_states are
actually handled. In free_netdev reference is given up and release
sequence is expected to start no matter what is the dev->reg_state. In
netdev_release memories are freed. There are just extra warnings in the
log from the functions.

> Systems with panic-on-warn just rebooted, and a "normal" system saw a
> traceback yet no error handling happened so why would the code even test
> this?

One use for panic-on-warn is the stabilation phase before "product"
quality. In this case you are interested in everything that is
unexpected. Having panic-on-warn set is easy way to make each warning in
the system visible to user and force to react on it.

Another example is my Syzkaller excercise. I gathered all the crashes
from the system and these include warnings as well. I will probably next
look at these warnings Syzkaller found and try to understand why they
are and do they need fixes.

>
> I'm not trying to be picky here, just to think about what you are doing
> with these checks please.

This state was not expected being anything else than
NETREG_UNREGISTERED. Probably error cases were not taken into account or
maybe error cases were not expected to end up to device release
sequence. Especially as cleaning up relies on normal device release
sequence on error cases this assumption is wrong.

Do you think better choise would be to remove BUG_ON completely rather
than replacing it with WARN_ON?

BR,

Jouni Högander

Powered by blists - more mailing lists