lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <d0b4ca2e-d58e-f611-219b-a8aff6c5fc75@nvidia.com>
Date:   Mon, 5 Jul 2021 13:48:10 +0300
From:   Paul Blakey <paulb@...dia.com>
To:     Paolo Abeni <pabeni@...hat.com>
CC:     <netdev@...r.kernel.org>, Eric Dumazet <eric.dumazet@...il.com>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        "Alexei Starovoitov" <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        "John Fastabend" <john.fastabend@...il.com>,
        Saeed Mahameed <saeedm@...dia.com>,
        "Oz Shlomo" <ozsh@...dia.com>, Roi Dayan <roid@...dia.com>,
        Vlad Buslov <vladbu@...dia.com>
Subject: Re: [PATCH net v2] skbuff: Release nfct refcount on napi stolen or
 re-used skbs



On Mon, 5 Jul 2021, Paolo Abeni wrote:

> On Mon, 2021-07-05 at 10:49 +0300, Paul Blakey wrote:
> > When multiple SKBs are merged to a new skb under napi GRO,
> > or SKB is re-used by napi, if nfct was set for them in the
> > driver, it will not be released while freeing their stolen
> > head state or on re-use.
> > 
> > Release nfct on napi's stolen or re-used SKBs, and
> > in gro_list_prepare, check conntrack metadata diff.
> > 
> > Fixes: 5c6b94604744 ("net/mlx5e: CT: Handle misses after executing CT action")
> > Reviewed-by: Roi Dayan <roid@...dia.com>
> > Signed-off-by: Paul Blakey <paulb@...dia.com>
> > ---
> > Changelog:
> > 	v1->v2:
> > 	 Check for different flows based on CT and chain metadata in gro_list_prepare
> > 
> >  net/core/dev.c    | 13 +++++++++++++
> >  net/core/skbuff.c |  1 +
> >  2 files changed, 14 insertions(+)
> > 
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index 439faadab0c2..bf62cb2ec6da 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -5981,6 +5981,18 @@ static void gro_list_prepare(const struct list_head *head,
> >  			diffs = memcmp(skb_mac_header(p),
> >  				       skb_mac_header(skb),
> >  				       maclen);
> > +
> > +		diffs |= skb_get_nfct(p) ^ skb_get_nfct(skb);
> > +
> > +		if (!diffs) {
> > +			struct tc_skb_ext *skb_ext = skb_ext_find(skb, TC_SKB_EXT);
> > +			struct tc_skb_ext *p_ext = skb_ext_find(p, TC_SKB_EXT);
> > +
> > +			diffs |= (!!p_ext) ^ (!!skb_ext);
> > +			if (!diffs && unlikely(skb_ext))
> > +				diffs |= p_ext->chain ^ skb_ext->chain;
> > +		}
> 
> I'm wondering... if 2 skbs are merged, and have the same L2/L3/L4
> headers - except len and csum - can they have different dst/TC_EXT?

Yes and same tunnel header metadata... so even tunnels are the same.

So probably not, I had trouble thinking of when it can happen as well.
But user might have some weird tc policy where it will happen, especially
with header rewrite.

To test this, I ran two tcp streams that do some hops in hardware (tc 
goto chain), and for one stream of the two, I used tc pedit rules to 
rewrite an ip/mac so the two stream will be the same 5 tuple (macs, 
ips, ports) just on different chains and on different connection tracking 
zones.

For the last hop where both streams where the same, I used tc flower
skip_hw flag so it will miss to software and we get here with same
same_flow = false.



This might be too much for email (so I won't bother formatting it :)) but 
here is the setup I used:


    echo "add arp rules"
    tc_filter add dev $REP ingress protocol arp prio 888 flower skip_hw 
$tc_verbose \
        action mirred egress mirror dev $VETH_REP1 pipe \
        action mirred egress redirect dev $VETH_REP2
    tc_filter add dev $VETH_REP2 ingress protocol arp prio 888 flower 
skip_hw $tc_verbose \
        action mirred egress redirect dev $REP
    tc_filter add dev $VETH_REP1 ingress protocol arp prio 888 flower 
skip_hw $tc_verbose \
        action mirred egress redirect dev $REP


    echo "add ct rules"

    flag=""

    #ORIG:
    #chain 0, REP->VETH[12], send to diff chains based on mac,
    #for zone 4 first do hw rewrite of fake ip so we have same tuple
    tc_filter add dev $REP ingress protocol ip chain 0 prio 1 flower 
ip_proto $proto dst_ip $ip_remote1 $tc_verbose $flag \
        dst_mac $remote_mac ct_state -trk \
        action ct zone 3 action goto chain 3
    #chain 2, continuation of header rewrite, send to chain 4 & zone 4
    tc_filter add dev $REP ingress protocol ip chain 2 prio 1 flower 
ip_proto $proto $tc_verbose \
        action ct zone 4 action goto chain 4

    #chain 3, REP->VETH, zone 3
    tc_filter add dev $REP ingress protocol ip chain 3 prio 1 flower 
ip_proto $proto $tc_verbose \
        ct_state +trk+new ct_zone 3 \
        action ct zone 3 commit \
        action mirred egress redirect dev $VETH_REP1
    tc_filter add dev $REP ingress protocol ip chain 3 prio 1 flower 
ip_proto $proto $tc_verbose \
        ct_state +trk+est ct_zone 3 \
        action mirred egress redirect dev $VETH_REP1

    #others...
    tc_filter add dev $REP ingress protocol ip chain 3 prio 5 flower 
ip_proto $proto skip_hw $tc_verbose \
        ct_zone 3 \
        action drop
    tc_filter add dev $REP ingress protocol ip chain 3 prio 6 flower 
ip_proto $proto skip_hw $tc_verbose \
        action drop

    #chain 4, REP->VETH2, zone 4, +new/+est, rewrite back mac/ip and fwd
    tc_filter add dev $REP ingress protocol ip chain 4 prio 5 flower 
ip_proto $proto skip_hw $tc_verbose \
        ct_zone 4 \
        action drop
    tc_filter add dev $REP ingress protocol ip chain 4 prio 6 flower 
ip_proto $proto skip_hw $tc_verbose \
        action drop

    #catch wrong packets
    tc_filter add dev $REP ingress protocol ip chain 4 prio 2 flower 
ip_proto $proto $tc_verbose skip_hw \
        ct_zone 3 \
        action drop
    tc_filter add dev $REP ingress protocol ip chain 3 prio 2 flower 
ip_proto $proto $tc_verbose skip_hw \
        ct_zone 4 \
        action drop


    #REPLY:
    #chain 0, VETH->REP, send to zone 3, then chain 6 for fwd
    tc_filter add dev $VETH_REP1 ingress protocol ip chain 0 prio 1 flower 
ip_proto $proto $tc_verbose $flag \
        ct_state -trk \
        action ct zone 3 action goto chain 6

    #chain 6, VETH->REP, zone 3, +est, fwd
    tc_filter add dev $VETH_REP1 ingress protocol ip chain 6 prio 1 flower 
ip_proto $proto $tc_verbose \
        ct_state +trk+est \
        action mirred egress redirect dev $REP

    #chain 0, VETH->REP, send to zone 4, fake ip for ct, then chain 6 for 
fwd


    #chain 6, VETH2->REP, zone 3, +est, revert fake ip and fwd to dev



    port1=6000
    port2=7000

    echo port1 $port1 port2 $port2

    tc_filter add dev $REP ingress protocol ip chain 0 prio 1 flower 
ip_proto $proto dst_ip $ip_remote2 $tc_verbose $flag \
        dst_mac $remote_mac2 ct_state -trk src_port $port2 \
        action pedit ex \
           munge eth dst set $remote_mac \
           munge ip dst set $ip_remote1 \
           munge $proto sport set $port1 \
           pipe \
        action csum $proto ip pipe \
        action action goto chain 2
    tc_filter add dev $REP ingress protocol ip chain 0 prio 2 flower 
ip_proto $proto dst_ip $ip_remote2 $tc_verbose $flag \
        dst_mac $remote_mac2 \
        action mirred egress redirect dev $VETH_REP2
    tc_filter add dev $REP ingress protocol ip chain 4 prio 1 flower 
ip_proto $proto $tc_verbose \
        ct_state +trk+new ct_zone 4 src_port $port1 \
        action ct zone 4 commit pipe \
        action pedit ex \
           munge eth dst set $remote_mac2 \
           munge ip dst set $ip_remote2 \
           munge $proto sport set $port2 \
           pipe \
        action csum $proto ip pipe \
        action mirred egress redirect dev $VETH_REP2
    tc_filter add dev $REP ingress protocol ip chain 4 prio 1 flower 
ip_proto $proto $tc_verbose \
        ct_state +trk+est ct_zone 4 src_port $port1 \
        action pedit ex \
           munge eth dst set $remote_mac2 \
           munge ip dst set $ip_remote2 \
           munge $proto sport set $port2 \
           pipe \
        action csum $proto ip pipe \
        action mirred egress redirect dev $VETH_REP2

    tc_filter add dev $VETH_REP2 ingress protocol ip chain 0 prio 1 flower 
ip_proto $proto $tc_verbose $flag \
        ct_state -trk dst_port $port2 \
        action pedit ex \
           munge ip src set $ip_remote1 \
           munge $proto dport set $port1 \
           pipe \
        action csum $proto ip pipe \
        action ct zone 4 action goto chain 6
    tc_filter add dev $VETH_REP2 ingress protocol ip chain 0 prio 2 flower 
ip_proto $proto $tc_verbose $flag \
        action mirred egress redirect dev $REP
    tc_filter add dev $VETH_REP2 ingress protocol ip chain 6 prio 1 flower 
ip_proto $proto $tc_verbose \
        ct_state +trk+est dst_port $port1 \
        action pedit ex \
           munge ip src set $ip_remote2 \
           munge $proto dport set $port2 \
           pipe \
        action csum $proto ip pipe \
        action mirred egress redirect dev $REP







> 
> @Eric: I'm sorry for the very dumb and late question. You reported v1
> of this patch would make "GRO slow as hell", could you please elaborate
> a bit more? I thought most skbs (with no ct attached) would see little
> difference???
> 
> Cheers,
> 
> Paolo
> 
> 
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ