netdev - Re: [RFC net-next 1/2] net: net-porcfs: Reduce rcu lock critical section

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Wed, 11 Apr 2018 15:30:10 -0700
From:   Eric Dumazet <eric.dumazet@...il.com>
To:     Saeed Mahameed <saeedm@...lanox.com>,
        "davem@...emloft.net" <davem@...emloft.net>
Cc:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: [RFC net-next 1/2] net: net-porcfs: Reduce rcu lock critical
 section



On 04/11/2018 11:59 AM, Saeed Mahameed wrote:
> On Tue, 2018-04-10 at 13:35 -0700, Eric Dumazet wrote:
>>
>> On 04/10/2018 10:16 AM, David Miller wrote:
>>>
>>> The tradeoff here is that now you are doing two unnecessary atomic
>>> operations per stats dump.
>>>
>>> That is what the RCU lock allows us to avoid.
>>>
>>
>> dev_hold() and dev_put() are actually per cpu increment and
>> decrements,
>> pretty cheap these days.
>>
> 
> Yes, i am adding only 2 cpu instructions here.
> I think the trade-off here is too small and the price to finally have
> get_stats64 called from non atomic context is really worth it.

Oh... but you have not mentioned this reason in your changelog.

What about bonding stats ?

By sending partial patches like that, others have to take care of the details
and this is not really acceptable.

> 
> It  looks really odd to me that the device chain locks are held for
> such long periods, while we already have the means to avoid this, same
> goes for rtnl_lock, same trick can work here for many use cases and
> many ndos, we are just over protective for no reasons.
> 
> 
>> Problem here is that any preemption of the process holding device
>> reference
>> might trigger warnings in device unregister.
>>
> 
> This is true for any other place dev_hold is used,
> as explained in the commit message dev_hold is used for a very brief
> moment before calling the stats ndo and released directly after.

Not really.

Other places usually have notifiers to remove the refcount when needed.

We worked quite hard in the past to remove extra dev_hold()
(before we finally converted it to percpu counter)

> 
> looking at 
> 
> netdev_wait_allrefs(...)
> [...]
> 	msleep(250);
> 
> 	refcnt = netdev_refcnt_read(dev);
> 
> 	if (time_after(jiffies, warning_time + 10 * HZ)) {
> 		pr_emerg("unregister_netdevice: waiting for %s to
> become free. Usage count = %d\n",
> 			 dev->name, refcnt);
> 		warning_time = jiffies;
> 	}
> 
> The holder will get enough time to release the netdev way before the
> warning is triggered.
> 
> The warning is triggered only if someone holds the dev for more than 10
> seconds which is impossible for the stats ndo to take more than this,
> in fact i just did a quick measurement and it seems that in average
> get_stats64 ndo takes 0.5us !

Average is nice, but what about max time ?

Sleeping more than 10 seconds to satify GFP_KERNEL memory allocation can definitely
happen in the real world, or simply if you get preempted by some RT/high prio tasks.

Just say no to 'might sleep in ndo_get_stats()', and you will save a lot of issues.