[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <d0b4ca2e-d58e-f611-219b-a8aff6c5fc75@nvidia.com>
Date: Mon, 5 Jul 2021 13:48:10 +0300
From: Paul Blakey <paulb@...dia.com>
To: Paolo Abeni <pabeni@...hat.com>
CC: <netdev@...r.kernel.org>, Eric Dumazet <eric.dumazet@...il.com>,
"David S. Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>,
"Alexei Starovoitov" <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
"John Fastabend" <john.fastabend@...il.com>,
Saeed Mahameed <saeedm@...dia.com>,
"Oz Shlomo" <ozsh@...dia.com>, Roi Dayan <roid@...dia.com>,
Vlad Buslov <vladbu@...dia.com>
Subject: Re: [PATCH net v2] skbuff: Release nfct refcount on napi stolen or
re-used skbs
On Mon, 5 Jul 2021, Paolo Abeni wrote:
> On Mon, 2021-07-05 at 10:49 +0300, Paul Blakey wrote:
> > When multiple SKBs are merged to a new skb under napi GRO,
> > or SKB is re-used by napi, if nfct was set for them in the
> > driver, it will not be released while freeing their stolen
> > head state or on re-use.
> >
> > Release nfct on napi's stolen or re-used SKBs, and
> > in gro_list_prepare, check conntrack metadata diff.
> >
> > Fixes: 5c6b94604744 ("net/mlx5e: CT: Handle misses after executing CT action")
> > Reviewed-by: Roi Dayan <roid@...dia.com>
> > Signed-off-by: Paul Blakey <paulb@...dia.com>
> > ---
> > Changelog:
> > v1->v2:
> > Check for different flows based on CT and chain metadata in gro_list_prepare
> >
> > net/core/dev.c | 13 +++++++++++++
> > net/core/skbuff.c | 1 +
> > 2 files changed, 14 insertions(+)
> >
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index 439faadab0c2..bf62cb2ec6da 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -5981,6 +5981,18 @@ static void gro_list_prepare(const struct list_head *head,
> > diffs = memcmp(skb_mac_header(p),
> > skb_mac_header(skb),
> > maclen);
> > +
> > + diffs |= skb_get_nfct(p) ^ skb_get_nfct(skb);
> > +
> > + if (!diffs) {
> > + struct tc_skb_ext *skb_ext = skb_ext_find(skb, TC_SKB_EXT);
> > + struct tc_skb_ext *p_ext = skb_ext_find(p, TC_SKB_EXT);
> > +
> > + diffs |= (!!p_ext) ^ (!!skb_ext);
> > + if (!diffs && unlikely(skb_ext))
> > + diffs |= p_ext->chain ^ skb_ext->chain;
> > + }
>
> I'm wondering... if 2 skbs are merged, and have the same L2/L3/L4
> headers - except len and csum - can they have different dst/TC_EXT?
Yes and same tunnel header metadata... so even tunnels are the same.
So probably not, I had trouble thinking of when it can happen as well.
But user might have some weird tc policy where it will happen, especially
with header rewrite.
To test this, I ran two tcp streams that do some hops in hardware (tc
goto chain), and for one stream of the two, I used tc pedit rules to
rewrite an ip/mac so the two stream will be the same 5 tuple (macs,
ips, ports) just on different chains and on different connection tracking
zones.
For the last hop where both streams where the same, I used tc flower
skip_hw flag so it will miss to software and we get here with same
same_flow = false.
This might be too much for email (so I won't bother formatting it :)) but
here is the setup I used:
echo "add arp rules"
tc_filter add dev $REP ingress protocol arp prio 888 flower skip_hw
$tc_verbose \
action mirred egress mirror dev $VETH_REP1 pipe \
action mirred egress redirect dev $VETH_REP2
tc_filter add dev $VETH_REP2 ingress protocol arp prio 888 flower
skip_hw $tc_verbose \
action mirred egress redirect dev $REP
tc_filter add dev $VETH_REP1 ingress protocol arp prio 888 flower
skip_hw $tc_verbose \
action mirred egress redirect dev $REP
echo "add ct rules"
flag=""
#ORIG:
#chain 0, REP->VETH[12], send to diff chains based on mac,
#for zone 4 first do hw rewrite of fake ip so we have same tuple
tc_filter add dev $REP ingress protocol ip chain 0 prio 1 flower
ip_proto $proto dst_ip $ip_remote1 $tc_verbose $flag \
dst_mac $remote_mac ct_state -trk \
action ct zone 3 action goto chain 3
#chain 2, continuation of header rewrite, send to chain 4 & zone 4
tc_filter add dev $REP ingress protocol ip chain 2 prio 1 flower
ip_proto $proto $tc_verbose \
action ct zone 4 action goto chain 4
#chain 3, REP->VETH, zone 3
tc_filter add dev $REP ingress protocol ip chain 3 prio 1 flower
ip_proto $proto $tc_verbose \
ct_state +trk+new ct_zone 3 \
action ct zone 3 commit \
action mirred egress redirect dev $VETH_REP1
tc_filter add dev $REP ingress protocol ip chain 3 prio 1 flower
ip_proto $proto $tc_verbose \
ct_state +trk+est ct_zone 3 \
action mirred egress redirect dev $VETH_REP1
#others...
tc_filter add dev $REP ingress protocol ip chain 3 prio 5 flower
ip_proto $proto skip_hw $tc_verbose \
ct_zone 3 \
action drop
tc_filter add dev $REP ingress protocol ip chain 3 prio 6 flower
ip_proto $proto skip_hw $tc_verbose \
action drop
#chain 4, REP->VETH2, zone 4, +new/+est, rewrite back mac/ip and fwd
tc_filter add dev $REP ingress protocol ip chain 4 prio 5 flower
ip_proto $proto skip_hw $tc_verbose \
ct_zone 4 \
action drop
tc_filter add dev $REP ingress protocol ip chain 4 prio 6 flower
ip_proto $proto skip_hw $tc_verbose \
action drop
#catch wrong packets
tc_filter add dev $REP ingress protocol ip chain 4 prio 2 flower
ip_proto $proto $tc_verbose skip_hw \
ct_zone 3 \
action drop
tc_filter add dev $REP ingress protocol ip chain 3 prio 2 flower
ip_proto $proto $tc_verbose skip_hw \
ct_zone 4 \
action drop
#REPLY:
#chain 0, VETH->REP, send to zone 3, then chain 6 for fwd
tc_filter add dev $VETH_REP1 ingress protocol ip chain 0 prio 1 flower
ip_proto $proto $tc_verbose $flag \
ct_state -trk \
action ct zone 3 action goto chain 6
#chain 6, VETH->REP, zone 3, +est, fwd
tc_filter add dev $VETH_REP1 ingress protocol ip chain 6 prio 1 flower
ip_proto $proto $tc_verbose \
ct_state +trk+est \
action mirred egress redirect dev $REP
#chain 0, VETH->REP, send to zone 4, fake ip for ct, then chain 6 for
fwd
#chain 6, VETH2->REP, zone 3, +est, revert fake ip and fwd to dev
port1=6000
port2=7000
echo port1 $port1 port2 $port2
tc_filter add dev $REP ingress protocol ip chain 0 prio 1 flower
ip_proto $proto dst_ip $ip_remote2 $tc_verbose $flag \
dst_mac $remote_mac2 ct_state -trk src_port $port2 \
action pedit ex \
munge eth dst set $remote_mac \
munge ip dst set $ip_remote1 \
munge $proto sport set $port1 \
pipe \
action csum $proto ip pipe \
action action goto chain 2
tc_filter add dev $REP ingress protocol ip chain 0 prio 2 flower
ip_proto $proto dst_ip $ip_remote2 $tc_verbose $flag \
dst_mac $remote_mac2 \
action mirred egress redirect dev $VETH_REP2
tc_filter add dev $REP ingress protocol ip chain 4 prio 1 flower
ip_proto $proto $tc_verbose \
ct_state +trk+new ct_zone 4 src_port $port1 \
action ct zone 4 commit pipe \
action pedit ex \
munge eth dst set $remote_mac2 \
munge ip dst set $ip_remote2 \
munge $proto sport set $port2 \
pipe \
action csum $proto ip pipe \
action mirred egress redirect dev $VETH_REP2
tc_filter add dev $REP ingress protocol ip chain 4 prio 1 flower
ip_proto $proto $tc_verbose \
ct_state +trk+est ct_zone 4 src_port $port1 \
action pedit ex \
munge eth dst set $remote_mac2 \
munge ip dst set $ip_remote2 \
munge $proto sport set $port2 \
pipe \
action csum $proto ip pipe \
action mirred egress redirect dev $VETH_REP2
tc_filter add dev $VETH_REP2 ingress protocol ip chain 0 prio 1 flower
ip_proto $proto $tc_verbose $flag \
ct_state -trk dst_port $port2 \
action pedit ex \
munge ip src set $ip_remote1 \
munge $proto dport set $port1 \
pipe \
action csum $proto ip pipe \
action ct zone 4 action goto chain 6
tc_filter add dev $VETH_REP2 ingress protocol ip chain 0 prio 2 flower
ip_proto $proto $tc_verbose $flag \
action mirred egress redirect dev $REP
tc_filter add dev $VETH_REP2 ingress protocol ip chain 6 prio 1 flower
ip_proto $proto $tc_verbose \
ct_state +trk+est dst_port $port1 \
action pedit ex \
munge ip src set $ip_remote2 \
munge $proto dport set $port2 \
pipe \
action csum $proto ip pipe \
action mirred egress redirect dev $REP
>
> @Eric: I'm sorry for the very dumb and late question. You reported v1
> of this patch would make "GRO slow as hell", could you please elaborate
> a bit more? I thought most skbs (with no ct attached) would see little
> difference???
>
> Cheers,
>
> Paolo
>
>
>
Powered by blists - more mailing lists