[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAAFAkD9nb1uRypAViV+OQ+M1NiFfO4DVozQb9U4UVD_K88OXBQ@mail.gmail.com>
Date: Mon, 18 Dec 2023 10:35:08 -0500
From: Jamal Hadi Salim <hadi@...atatu.com>
To: Davide Caratti <dcaratti@...hat.com>
Cc: Victor Nogueira <victor@...atatu.com>, jhs@...atatu.com, davem@...emloft.net,
edumazet@...gle.com, kuba@...nel.org, pabeni@...hat.com,
xiyou.wangcong@...il.com, jiri@...nulli.us, mleitner@...hat.com,
pctammela@...atatu.com, netdev@...r.kernel.org, kernel@...atatu.com,
Florian Westphal <fw@...len.de>
Subject: Re: [PATCH RFC net-next] net: sched: act_mirred: Extend the cpu
mirred nest guard with an explicit loop ttl
On Mon, Dec 18, 2023 at 3:49 AM Davide Caratti <dcaratti@...hat.com> wrote:
>
> hello Victor, thanks for the patch!
>
> On Fri, Dec 15, 2023 at 03:08:27PM -0300, Victor Nogueira wrote:
> > As pointed out by Jamal in:
> > https://lore.kernel.org/netdev/CAM0EoMn4C-zwrTCGzKzuRYukxoqBa8tyHyFDwUSZYwkMOUJ4Lw@mail.gmail.com/
> >
> > Mirred is allowing for infinite loops in certain use cases, such as the
> > following:
> >
> > ----
> > sudo ip netns add p4node
> > sudo ip link add p4port0 address 10:00:00:01:AA:BB type veth peer \
> > port0 address 10:00:00:02:AA:BB
> >
> > sudo ip link set dev port0 netns p4node
> > sudo ip a add 10.0.0.1/24 dev p4port0
> > sudo ip neigh add 10.0.0.2 dev p4port0 lladdr 10:00:00:02:aa:bb
> > sudo ip netns exec p4node ip a add 10.0.0.2/24 dev port0
> > sudo ip netns exec p4node ip l set dev port0 up
> > sudo ip l set dev p4port0 up
> > sudo ip netns exec p4node tc qdisc add dev port0 clsact
> > sudo ip netns exec p4node tc filter add dev port0 ingress protocol ip \
> > prio 10 matchall action mirred ingress redirect dev port0
> >
> > ping -I p4port0 10.0.0.2 -c 1
> > -----
> >
> > To solve this, we reintroduced a ttl variable attached to the skb (in
> > struct tc_skb_cb) which will prevent infinite loops for use cases such as
> > the one described above.
> >
> > The nest per cpu variable (tcf_mirred_nest_level) is now only used for
> > detecting whether we should call netif_rx or netif_receive_skb when
> > sending the packet to ingress.
>
> looks good to me. Do you think it's worth setting an initial value (0, AFAIU)
> for tc_skb_cb(skb)->ttl inside tc_run() ?
>
Good point but I am afraid that will reset the loop counter (imagine
ingress->ingress, egress->ingress etc). So it wont work. Unfortunately
we've hit a snag with cb because it is shared across multiple layers.
I am afraid we cant ignore it.
If the packet came downward from some upper layer (or driver, buggy
mostly) using the same ttl spot in the cb, then the ttl field will be
either 0 or > 0.
1) 0 < ttl < 4 then we will interpret it as "packet has looped before"
and we wont drop it, so we are good here and we will end up dropping
it later in one or more loop.
2) If the retrieved ttl is >=4 we will immediately drop it. This is
catastrophic because we cant stop ipv4 or esp from using this field to
their pleasure and they certainly dont reset these fields.
I dont see a way out unless we extend the skb->tc_at_ingress to be to
2 bits as it was originally. In the original code it was called we had
SET_TC_AT() which set two bits to in the skb->verdict to say "this
packet was last seen at ingress/egress/elsewhere". Elsewhere was a 0
and this got changed to a boolean which translates to "this packet was
last seen at ingress/egress". So if it came from a freshly allocated
skb eg from driver or if it came from the stack it will be 0 (since
the field is reserved for tc).
+Cc Florian.
cheers,
jamal
> other than this,
>
> Acked-by: Davide Caratti <dcaratti@...hat.com>
>
Powered by blists - more mailing lists