lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 09 Sep 2014 09:35:23 -0700
From:	John Fastabend <john.fastabend@...il.com>
To:	Eric Dumazet <eric.dumazet@...il.com>
CC:	xiyou.wangcong@...il.com, davem@...emloft.net, jhs@...atatu.com,
	netdev@...r.kernel.org, paulmck@...ux.vnet.ibm.com,
	brouer@...hat.com
Subject: Re: [net-next PATCH v3 09/15] net: sched: make cls_u32 lockless

On 09/09/2014 06:20 AM, Eric Dumazet wrote:
> On Mon, 2014-09-08 at 22:58 -0700, John Fastabend wrote:
>> Make cls_u32 classifier safe to run without holding lock. This patch
>> converts statistics that are kept in read section u32_classify into
>> per cpu counters.
>>
>> This patch was tested with a tight u32 filter add/delete loop while
>> generating traffic with pktgen. By running pktgen on vlan devices
>> created on top of a physical device we can hit the qdisc layer
>> correctly. For ingress qdisc's a loopback cable was used.
>>
>> for i in {1..100}; do
>>          q=`echo $i%8|bc`;
>>          echo -n "u32 tos: iteration $i on queue $q";
>>          tc filter add dev p3p2 parent $p prio $i u32 match ip tos 0x10 0xff \
>>                    action skbedit queue_mapping $q;
>>          sleep 1;
>>          tc filter del dev p3p2 prio $i;
>>
>>          echo -n "u32 tos hash table: iteration $i on queue $q";
>>          tc filter add dev p3p2 parent $p protocol ip prio $i handle 628: u32 divisor 1
>>          tc filter add dev p3p2 parent $p protocol ip prio $i u32 \
>>                  match ip protocol 17 0xff link 628: offset at 0 mask 0xf00 shift 6 plus 0
>>          tc filter add dev p3p2 parent $p protocol ip prio $i u32 \
>>                  ht 628:0 match ip tos 0x10 0xff action skbedit queue_mapping $q
>>          sleep 2;
>>          tc filter del dev p3p2 prio $i
>>          sleep 1;
>> done
>>
>
>
> Note it might be easier to split this patch in 2 parts.
>
> (The percpu stuff could be done in a first step, then rcu conversion)

Sure, I'll split it up.

>
>> Signed-off-by: John Fastabend <john.r.fastabend@...el.com>
>> ---

[...]

>>
>>   static inline unsigned int u32_hash_fold(__be32 key,
>> @@ -96,7 +103,7 @@ static int u32_classify(struct sk_buff *skb, const struct tcf_proto *tp, struct
>>   		unsigned int	  off;
>>   	} stack[TC_U32_MAXDEPTH];
>>
>> -	struct tc_u_hnode *ht = tp->root;
>> +	struct tc_u_hnode *ht = rcu_dereference_bh(tp->root);
>>   	unsigned int off = skb_network_offset(skb);
>>   	struct tc_u_knode *n;
>>   	int sdepth = 0;
>> @@ -108,23 +115,23 @@ static int u32_classify(struct sk_buff *skb, const struct tcf_proto *tp, struct
>>   	int i, r;
>>
>>   next_ht:
>> -	n = ht->ht[sel];
>> +	n = rcu_dereference_bh(ht->ht[sel]);
>>
>>   next_knode:
>>   	if (n) {
>>   		struct tc_u32_key *key = n->sel.keys;
>>
>>   #ifdef CONFIG_CLS_U32_PERF
>> -		n->pf->rcnt += 1;
>> +		this_cpu_inc(n->pf->rcnt);
>
> As we run in BH, we are not preemptable, we can use instead :
> __this_cpu_inc()  (on all occurrences)
>
> Using a macro would also reduce the #ifdef mess of this file
>

I'll go ahead and do the macro conversion.

>>   		j = 0;
>>   #endif
>>
>>   #ifdef CONFIG_CLS_U32_MARK
>> -		if ((skb->mark & n->mark.mask) != n->mark.val) {
>> -			n = n->next;
>> +		if ((skb->mark & n->mask) != n->val) {
>> +			n = rcu_dereference_bh(n->next);
>>   			goto next_knode;
>>   		} else {
>> -			n->mark.success++;
>> +			this_cpu_inc(*n->pcpu_success);
>>   		}
>>   #endif
>>

[...]


>> @@ -326,11 +342,11 @@ static int u32_init(struct tcf_proto *tp)
>>   	}
>>
>>   	tp_c->refcnt++;
>> -	root_ht->next = tp_c->hlist;
>> -	tp_c->hlist = root_ht;
>> +	rcu_assign_pointer(root_ht->next, tp_c->hlist);
>
> 	RCU_INIT_POINTER() or root_ht->next = tp_c->hlist;
>

I'll fix all these as well.

>> +	rcu_assign_pointer(tp_c->hlist, root_ht);
>>   	root_ht->tp_c = tp_c;
>>
>> -	tp->root = root_ht;
>> +	rcu_assign_pointer(tp->root, root_ht);
>>   	tp->data = tp_c;
>>   	return 0;
>>   }
>> @@ -342,25 +358,33 @@ static int u32_destroy_key(struct tcf_proto *tp, struct tc_u_knode *n)
>>   	if (n->ht_down)
>>   		n->ht_down->refcnt--;
>>   #ifdef CONFIG_CLS_U32_PERF
>> -	kfree(n->pf);
>> +	free_percpu(n->pf);
>>   #endif
>>   	kfree(n);
>>   	return 0;
>>   }
>>
>> +void __u32_delete_key(struct rcu_head *rcu)
>
> Can we consistently use _rcu, as in u32_delete_key_rcu() ?

Yes, I'll rename these calls.

Thanks,
John

-- 
John Fastabend         Intel Corporation
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ