[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <57E7FAC7.6090904@iogearbox.net>
Date: Sun, 25 Sep 2016 18:26:47 +0200
From: Daniel Borkmann <daniel@...earbox.net>
To: Jamal Hadi Salim <jhs@...atatu.com>,
Shmulik Ladkani <shmulik.ladkani@...il.com>
CC: "David S. Miller" <davem@...emloft.net>,
WANG Cong <xiyou.wangcong@...il.com>,
Eric Dumazet <edumazet@...gle.com>, netdev@...r.kernel.org,
Florian Westphal <fw@...len.de>
Subject: Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress
actions
On 09/25/2016 03:05 PM, Jamal Hadi Salim wrote:
> On 16-09-23 11:40 AM, Shmulik Ladkani wrote:
>> On Fri, 23 Sep 2016 08:48:33 -0400 Jamal Hadi Salim <jhs@...atatu.com> wrote:
>>>> Even today, one may create loops using existing 'egress redirect',
>>>> e.g. this rediculously errorneous construct:
>>>>
>>>> # ip l add v0 type veth peer name v0p
>>>> # tc filter add dev v0p parent ffff: basic \
>>>> action mirred egress redirect dev v0
>>>
>>> I think we actually recover from this one by eventually
>>> dropping (theres a ttl field).
>>
>> [off topic]
>>
>> Don't know about that :) cpu fan got very noisy, 3 of 4 cores at 100%,
>> and after one second I got:
>>
>> # ip -s l show type veth
>> 16: v0p@v0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
>> link/ether a2:64:ff:10:dd:85 brd ff:ff:ff:ff:ff:ff
>> RX: bytes packets errors dropped overrun mcast
>> 71660305923 469890864 0 0 0 0
>> TX: bytes packets errors dropped carrier collsns
>> 3509 24 0 0 0 0
>> 17: v0@v0p: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
>> link/ether 52:a2:34:f6:7c:ec brd ff:ff:ff:ff:ff:ff
>> RX: bytes packets errors dropped overrun mcast
>> 3509 24 0 0 0 0
>> TX: bytes packets errors dropped carrier collsns
>> 71660713017 469893555 0 0 0 0
>>
>
> I think this is still on topic!
>
> Now I realize that code we took out around 4.2.x is still useful
> for such a use case (I wasnt thinking about veth when Florian was
> slimming the skb);
> +Cc Florian W.
>
> This snippet from 4.2:
> -------------
> 3525 static int ing_filter(struct sk_buff *skb, struct netdev_queue *rxq)
> 3526 {
> 3527 struct net_device *dev = skb->dev;
> 3528 u32 ttl = G_TC_RTTL(skb->tc_verd);
> 3529 int result = TC_ACT_OK;
> 3530 struct Qdisc *q;
> 3531
> 3532 if (unlikely(MAX_RED_LOOP < ttl++)) {
> 3533 net_warn_ratelimited("Redir loop detected Dropping packet (%d->%d)\n",
> 3534 skb->skb_iif, dev->ifindex);
> 3535 return TC_ACT_SHOT;
> 3536 }
> 3537
> 3538 skb->tc_verd = SET_TC_RTTL(skb->tc_verd, ttl);
> 3539 skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_INGRESS);
> 3540
> 3541 q = rcu_dereference(rxq->qdisc);
> 3542 if (q != &noop_qdisc) {
> 3543 spin_lock(qdisc_lock(q));
> 3544 if (likely(!test_bit(__QDISC_STATE_DEACTIVATED, &q->state)))
> 3545 result = qdisc_enqueue_root(skb, q);
> 3546 spin_unlock(qdisc_lock(q));
> 3547 }
> 3548
> 3549 return result;
> 3550 }
> --------------------
>
> MAX_RED_LOOP (stands for "Maximum Redirect loop") still exists in
> current code. The idea above was that we would increment the rttl
> counter once and if we saw it again upto MAX_RED_LOOP we would assume
> a loop and drop the packet (at the time i didnt think it was wise to
> let the actions be in charge of setting the RTTL; it had to be central
> core code - but it may not be neccessary)
>
> Florian, when we discussed I said it was fine to reclaim those 3 bits
> on tc verdict for RTTL at the time because i had taken out the
> feature and never added it back. Your comment at the time was we can
> add it back when someone shows up with the feature.
> Shmulik is looking to add it.
Why not just reuse xmit_recursion, which is what we did in tc cls_bpf
programs f.e. see __bpf_tx_skb()? Would be a pity to waste 3 bits on
this in the skb.
>> Similarly to all constructs injecting skbs to device rx (bond/team,
>> vlan, macvlan, tunnels, ifb, __dev_forward_skb callers, etc..), we are
>> obligated to assign 'skb2->dev' as the new rx device.
>>
>> Regarding 'skb2->skb_iif', original act_mirred code already has:
>>
>> skb2->skb_iif = skb->dev->ifindex; <--- THIS IS ORIG DEV IIF
>> skb2->dev = dev; <--- THIS IS TARGET DEV
>> err = dev_queue_xmit(skb2);
>>
>> I'm preserving this; OTOH the suggested modification in the patch is
>>
>> - err = dev_queue_xmit(skb2);
>> + if (tcf_mirred_act_direction(m->tcfm_eaction) & AT_EGRESS)
>> + err = dev_queue_xmit(skb2);
>> + else
>> + netif_receive_skb(skb2);
>>
>> now, the call to 'netif_receive_skb' will eventually override skb_iif to
>> the target RX dev's index, upon entry to __netif_receive_skb_core.
>>
>> I think this IS the expected behavior - as done by other "rx injection"
>> constructs.
>
> Sounds fine.
> I am wondering if we can have a tracing feature to show the lifetime of
> the packet as it is being cycled around the kernel? It would help
> debugging if some policy misbehaves.
>
>> My doubts were around whether we should call 'dev_forward_skb' instead
>> of 'netif_receive_skb'.
>> The former does some things I assumed we're not interested of, like
>> testing 'is_skb_forwardable' and re-running 'eth_type_trans'.
>> OTOH, it DOES scrub the skb.
>> Maybe we should scrub it as well prior the netif_receive_skb call?
>
> Scrubbing the skb could be a bad idea if it gets rid of global state
> like the RTTL if you add it back.
>
> cheers,
> jamal
>
Powered by blists - more mailing lists