netdev - Re: [RFC PATCH 06/12] net: sched: support qdisc

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <56897862.6080803@gmail.com>
Date:	Sun, 03 Jan 2016 11:37:06 -0800
From:	John Fastabend <john.fastabend@...il.com>
To:	Alexei Starovoitov <alexei.starovoitov@...il.com>
CC:	daniel@...earbox.net, eric.dumazet@...il.com, jhs@...atatu.com,
	aduyck@...antis.com, brouer@...hat.com, davem@...emloft.net,
	john.r.fastabend@...el.com, netdev@...r.kernel.org
Subject: Re: [RFC PATCH 06/12] net: sched: support qdisc_reset on NOLOCK qdisc

On 15-12-31 06:30 PM, Alexei Starovoitov wrote:
> On Wed, Dec 30, 2015 at 09:53:13AM -0800, John Fastabend wrote:
>> The qdisc_reset operation depends on the qdisc lock at the moment
>> to halt any additions to gso_skb and statistics while the list is
>> free'd and the stats zeroed.
>>
>> Without the qdisc lock we can not guarantee another cpu is not in
>> the process of adding a skb to one of the "cells". Here are the
>> two cases we have to handle.
>>
>>  case 1: qdisc_graft operation. In this case a "new" qdisc is attached
>> 	 and the 'qdisc_destroy' operation is called on the old qdisc.
>> 	 The destroy operation will wait a rcu grace period and call
>> 	 qdisc_rcu_free(). At which point gso_cpu_skb is free'd along
>> 	 with all stats so no need to zero stats and gso_cpu_skb from
>> 	 the reset operation itself.
>>
>> 	 Because we can not continue to call qdisc_reset before waiting
>> 	 an rcu grace period so that the qdisc is detached from all
>> 	 cpus simply do not call qdisc_reset() at all and let the
>> 	 qdisc_destroy operation clean up the qdisc. Note, a refcnt
>> 	 greater than 1 would cause the destroy operation to be
>> 	 aborted however if this ever happened the reference to the
>> 	 qdisc would be lost and we would have a memory leak.
>>
>>  case 2: dev_deactivate sequence. This can come from a user bringing
>> 	 the interface down which causes the gso_skb list to be flushed
>> 	 and the qlen zero'd. At the moment this is protected by the
>> 	 qdisc lock so while we clear the qlen/gso_skb fields we are
>> 	 guaranteed no new skbs are added. For the lockless case
>> 	 though this is not true. To resolve this move the qdisc_reset
>> 	 call after the new qdisc is assigned and a grace period is
>> 	 exercised to ensure no new skbs can be enqueued. Further
>> 	 the RTNL lock is held so we can not get another call to
>> 	 activate the qdisc while the skb lists are being free'd.
>>
>> 	 Finally, fix qdisc_reset to handle the per cpu stats and
>> 	 skb lists.
>>
>> Signed-off-by: John Fastabend <john.r.fastabend@...el.com>
> ...
>> -	/* Prune old scheduler */
>> -	if (oqdisc && atomic_read(&oqdisc->refcnt) <= 1)
>> -		qdisc_reset(oqdisc);
>> -
> ...
>> -		sync_needed |= !dev->dismantle;
>> +		sync_needed = true;
> 
> I think killing above <=1 check and forcing synchronize_net() will
> make qdisc destruction more reliable than it's right now.

I'll probably do that in a follow up series. Its a bit tricky and
actually I'm not convinced we aren't leaking memory as it is before
this series when we transition from per txq qdiscs and the more
traditional single big qdisc in a few cases. I'll take a look tomorrow
with kmemleak and tools. My guess is in practice people load a qdisc
and don't change it much and better setups for those types of things
is to load your favorite qdisc under mq or mqprio when using
multiqueue devices.

> Your commit log sounds too pessimistic :)
> 
> btw, sync_needed variable can be removed as well.

Yep thanks good catch.

> 
> All other patches look good. Great stuff overall!
> 

Thanks, I'm going to add mqprio and multiq support, do some testing and
some perf work then submit it. The alf_queue bits can be optimized
further later.

Also after we get multiq support lockless we can run filters on egress
side without the qdisc lock overhead.

.John
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html