netdev - Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6d2bd45a-a8a0-846d-5934-5e246522cab8@mojatatu.com>
Date:   Sun, 25 Sep 2016 09:05:08 -0400
From:   Jamal Hadi Salim <jhs@...atatu.com>
To:     Shmulik Ladkani <shmulik.ladkani@...il.com>
Cc:     "David S. Miller" <davem@...emloft.net>,
        WANG Cong <xiyou.wangcong@...il.com>,
        Eric Dumazet <edumazet@...gle.com>, netdev@...r.kernel.org,
        Florian Westphal <fw@...len.de>
Subject: Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress
 actions

On 16-09-23 11:40 AM, Shmulik Ladkani wrote:
> On Fri, 23 Sep 2016 08:48:33 -0400 Jamal Hadi Salim <jhs@...atatu.com> wrote:
>>> Even today, one may create loops using existing 'egress redirect',
>>> e.g. this rediculously errorneous construct:
>>>
>>>  # ip l add v0 type veth peer name v0p
>>>  # tc filter add dev v0p parent ffff: basic \
>>>     action mirred egress redirect dev v0
>>
>> I think we actually recover from this one by eventually
>> dropping (theres a ttl field).
>
> [off topic]
>
> Don't know about that :) cpu fan got very noisy, 3 of 4 cores at 100%,
> and after one second I got:
>
> # ip -s l show type veth
> 16: v0p@v0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
>     link/ether a2:64:ff:10:dd:85 brd ff:ff:ff:ff:ff:ff
>     RX: bytes  packets  errors  dropped overrun mcast
>     71660305923 469890864 0       0       0       0
>     TX: bytes  packets  errors  dropped carrier collsns
>     3509       24       0       0       0       0
> 17: v0@v0p: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
>     link/ether 52:a2:34:f6:7c:ec brd ff:ff:ff:ff:ff:ff
>     RX: bytes  packets  errors  dropped overrun mcast
>     3509       24       0       0       0       0
>     TX: bytes  packets  errors  dropped carrier collsns
>     71660713017 469893555 0       0       0       0
>

I think this is still on topic!

Now I realize that code we took out around 4.2.x is still useful
for such a use case (I wasnt thinking about veth when Florian was
slimming the skb);
+Cc Florian W.

This snippet from 4.2:
-------------
3525 static int ing_filter(struct sk_buff *skb, struct netdev_queue *rxq)
3526 {
3527         struct net_device *dev = skb->dev;
3528         u32 ttl = G_TC_RTTL(skb->tc_verd);
3529         int result = TC_ACT_OK;
3530         struct Qdisc *q;
3531
3532         if (unlikely(MAX_RED_LOOP < ttl++)) {
3533                 net_warn_ratelimited("Redir loop detected Dropping 
packet (%d->%d)\n",
3534                                      skb->skb_iif, dev->ifindex);
3535                 return TC_ACT_SHOT;
3536         }
3537
3538         skb->tc_verd = SET_TC_RTTL(skb->tc_verd, ttl);
3539         skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_INGRESS);
3540
3541         q = rcu_dereference(rxq->qdisc);
3542         if (q != &noop_qdisc) {
3543                 spin_lock(qdisc_lock(q));
3544                 if (likely(!test_bit(__QDISC_STATE_DEACTIVATED, 
&q->state)))
3545                         result = qdisc_enqueue_root(skb, q);
3546                 spin_unlock(qdisc_lock(q));
3547         }
3548
3549         return result;
3550 }
--------------------

MAX_RED_LOOP (stands for "Maximum Redirect loop") still exists in
current code. The idea above was that we would increment the rttl
counter  once and if we saw it again upto MAX_RED_LOOP we would assume
a loop and drop the packet (at the time i didnt think it was wise to
let the actions be in charge of setting the RTTL; it had to be central
core code - but it may not be neccessary)

Florian, when we discussed I said it was fine to reclaim those 3 bits
on tc verdict for RTTL at the time because i had taken out the
feature and never added it back. Your comment at the time was we can
add it back when someone shows up with the feature.
Shmulik is looking to add it.

> Similarly to all constructs injecting skbs to device rx (bond/team,
> vlan, macvlan, tunnels, ifb, __dev_forward_skb callers, etc..), we are
> obligated to assign 'skb2->dev' as the new rx device.
>
> Regarding 'skb2->skb_iif', original act_mirred code already has:
>
>  	skb2->skb_iif = skb->dev->ifindex;   <--- THIS IS ORIG DEV IIF
>  	skb2->dev = dev;                     <--- THIS IS TARGET DEV
> 	err = dev_queue_xmit(skb2);
>
> I'm preserving this; OTOH the suggested modification in the patch is
>
> -	err = dev_queue_xmit(skb2);
> +	if (tcf_mirred_act_direction(m->tcfm_eaction) & AT_EGRESS)
> +		err = dev_queue_xmit(skb2);
> +	else
> +		netif_receive_skb(skb2);
>
> now, the call to 'netif_receive_skb' will eventually override skb_iif to
> the target RX dev's index, upon entry to __netif_receive_skb_core.
>
> I think this IS the expected behavior - as done by other "rx injection"
> constructs.
>

Sounds fine.
I am wondering if we can have a tracing feature to show the lifetime of
the packet as it is being cycled around the kernel? It would help
debugging if some policy misbehaves.

> My doubts were around whether we should call 'dev_forward_skb' instead
> of 'netif_receive_skb'.
> The former does some things I assumed we're not interested of, like
> testing 'is_skb_forwardable' and re-running 'eth_type_trans'.
> OTOH, it DOES scrub the skb.
> Maybe we should scrub it as well prior the netif_receive_skb call?
>

Scrubbing the skb could be a bad idea if it gets rid of global state
like the RTTL if you add it back.

cheers,
jamal