lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 16 Nov 2018 12:28:54 +0100
From:   Davide Caratti <dcaratti@...hat.com>
To:     Eric Dumazet <eric.dumazet@...il.com>,
        Jamal Hadi Salim <jhs@...atatu.com>,
        Cong Wang <xiyou.wangcong@...il.com>,
        Jiri Pirko <jiri@...nulli.us>,
        "David S. Miller" <davem@...emloft.net>
Cc:     netdev@...r.kernel.org
Subject: Re: [PATCH net-next 2/2] net/sched: act_police: don't use spinlock
 in the data path

On Thu, 2018-11-15 at 05:53 -0800, Eric Dumazet wrote:
> 
> On 11/15/2018 03:43 AM, Davide Caratti wrote:
> > On Wed, 2018-11-14 at 22:46 -0800, Eric Dumazet wrote:
> > > On 09/13/2018 10:29 AM, Davide Caratti wrote:
> > > > use RCU instead of spinlocks, to protect concurrent read/write on
> > > > act_police configuration. This reduces the effects of contention in the
> > > > data path, in case multiple readers are present.
> > > > 
> > > > Signed-off-by: Davide Caratti <dcaratti@...hat.com>
> > > > ---
> > > >  net/sched/act_police.c | 156 ++++++++++++++++++++++++-----------------
> > > >  1 file changed, 92 insertions(+), 64 deletions(-)
> > > > 
> > > 
> > > I must be missing something obvious with this patch.
> > 
> > hello Eric,
> > 
> > On the opposite, I missed something obvious when I wrote that patch: there
> > is a race condition on tcfp_toks, tcfp_ptoks and tcfp_t_c: thank you for
> > noticing it.
> > 
> > These variables still need to be protected with a spinlock. I will do a
> > patch and evaluate if 'act_police' is still faster than a version where   
> > 2d550dbad83c ("net/sched: .... ") is reverted, and share results in the
> > next hours.
> > 
> > Ok?
> > 
> 
> SGTM, thanks.

hello,
I just finished the comparison of act_police, in the following cases:

a) revert the RCU-ification (i.e. commit 2d550dbad83c ("net/sched:
act_police: don't use spinlock in the data path"), and leave per-cpu
counters used by the rate estimator

b) keep RCU-ified configuration parameters, and protect read/update of
tcfp_toks, tcfp_ptoks and tcfp_t with a spinlock (code at the bottom  of
this message).

## Test setup:

$DEV is a 'dummy' with clsact qdisc; the following two commands,

# test police with 'rate'
$TC filter add dev $DEV egress matchall \
 action police rate 2gbit burst 100k conform-exceed pass/pass index 100

# test police with 'avrate'
$TC filter add dev prova egress estimator 1s 8s matchall \
        action police avrate 2gbit conform-exceed pass/pass index 100

are tested with the following loop:

for c in 1 2 4 8 16; do
./pktgen/pktgen_bench_xmit_mode_queue_xmit.sh -v -s 64  -t $c -n 5000000 -i $DEV
done


## Test results:

using  rate  | reverted       | patched
        $c   | act_police (a) | act_police (b)
-------------+----------------+---------------
         1   |       3364442  |      3345580  
         2   |       2703030  |      2721919  
         4   |       1130146  |      1253555
         8   |        664238  |       658777
        16   |        154026  |       155259


using avrate | reverted       | patched
        $c   | act_police (a) | act_police (b)
-------------+----------------+---------------
         1   |       3621796  |      3658472 
         2   |       3075589  |      3548135  
         4   |       2313314  |      3343717
         8   |        768458  |      3260480
        16   |        177776  |      3254128


so, 'avrate' still gets a significant improvement because the 'conform/exceed'
decision doesn't need the spinlock in this case. The estimation is probably
less accurate, because it use per-CPU variables: if this is not acceptable,
then we need to revert also 93be42f9173b ("net/sched: act_police: use per-cpu
counters").


## patch code:

-- >8 --
diff --git a/net/sched/act_police.c b/net/sched/act_police.c
index 052855d..42db852 100644
--- a/net/sched/act_police.c
+++ b/net/sched/act_police.c
@@ -27,10 +27,7 @@ struct tcf_police_params {
 	u32			tcfp_ewma_rate;
 	s64			tcfp_burst;
 	u32			tcfp_mtu;
-	s64			tcfp_toks;
-	s64			tcfp_ptoks;
 	s64			tcfp_mtu_ptoks;
-	s64			tcfp_t_c;
 	struct psched_ratecfg	rate;
 	bool			rate_present;
 	struct psched_ratecfg	peak;
@@ -40,6 +37,9 @@ struct tcf_police_params {
 
 struct tcf_police {
 	struct tc_action	common;
+	s64			tcfp_toks;
+	s64			tcfp_ptoks;
+	s64			tcfp_t_c;
 	struct tcf_police_params __rcu *params;
 };
 
@@ -186,12 +186,6 @@ static int tcf_police_init(struct net *net, struct nlattr *nla,
 	}
 
 	new->tcfp_burst = PSCHED_TICKS2NS(parm->burst);
-	new->tcfp_toks = new->tcfp_burst;
-	if (new->peak_present) {
-		new->tcfp_mtu_ptoks = (s64)psched_l2t_ns(&new->peak,
-							 new->tcfp_mtu);
-		new->tcfp_ptoks = new->tcfp_mtu_ptoks;
-	}
 
 	if (tb[TCA_POLICE_AVRATE])
 		new->tcfp_ewma_rate = nla_get_u32(tb[TCA_POLICE_AVRATE]);
@@ -207,7 +201,14 @@ static int tcf_police_init(struct net *net, struct nlattr *nla,
 	}
 
 	spin_lock_bh(&police->tcf_lock);
-	new->tcfp_t_c = ktime_get_ns();
+	police->tcfp_t_c = ktime_get_ns();
+	police->tcfp_toks = new->tcfp_burst;
+	if (new->peak_present) {
+		new->tcfp_mtu_ptoks = (s64)psched_l2t_ns(&new->peak,
+							 new->tcfp_mtu);
+		police->tcfp_ptoks = new->tcfp_mtu_ptoks;
+	}
+
 	police->tcf_action = parm->action;
 	rcu_swap_protected(police->params,
 			   new,
@@ -255,27 +256,29 @@ static int tcf_police_act(struct sk_buff *skb, const struct tc_action *a,
 			ret = p->tcfp_result;
 			goto end;
 		}
-
+		spin_lock_bh(&police->tcf_lock);
 		now = ktime_get_ns();
-		toks = min_t(s64, now - p->tcfp_t_c, p->tcfp_burst);
+		toks = min_t(s64, now - police->tcfp_t_c, p->tcfp_burst);
 		if (p->peak_present) {
-			ptoks = toks + p->tcfp_ptoks;
+			ptoks = toks + police->tcfp_ptoks;
 			if (ptoks > p->tcfp_mtu_ptoks)
 				ptoks = p->tcfp_mtu_ptoks;
 			ptoks -= (s64)psched_l2t_ns(&p->peak,
 						    qdisc_pkt_len(skb));
 		}
-		toks += p->tcfp_toks;
+		toks += police->tcfp_toks;
 		if (toks > p->tcfp_burst)
 			toks = p->tcfp_burst;
 		toks -= (s64)psched_l2t_ns(&p->rate, qdisc_pkt_len(skb));
 		if ((toks|ptoks) >= 0) {
-			p->tcfp_t_c = now;
-			p->tcfp_toks = toks;
-			p->tcfp_ptoks = ptoks;
+			police->tcfp_t_c = now;
+			police->tcfp_toks = toks;
+			police->tcfp_ptoks = ptoks;
+			spin_unlock_bh(&police->tcf_lock);
 			ret = p->tcfp_result;
 			goto inc_drops;
 		}
+		spin_unlock_bh(&police->tcf_lock);
 	}
 
 inc_overlimits:
-- >8 --

any feedback is appreciated.
thanks!
-- 
davide

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ