[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAK+XE=mtkJV2vYT0kH2Rp1x0EObSSbJD2gAppW=dTxi82WLC+Q@mail.gmail.com>
Date: Tue, 25 Jun 2019 14:26:55 +0100
From: John Hurley <john.hurley@...ronome.com>
To: Jamal Hadi Salim <jhs@...atatu.com>
Cc: Linux Netdev List <netdev@...r.kernel.org>,
David Miller <davem@...emloft.net>,
Florian Westphal <fw@...len.de>,
Simon Horman <simon.horman@...ronome.com>,
Jakub Kicinski <jakub.kicinski@...ronome.com>,
oss-drivers@...ronome.com
Subject: Re: [PATCH net-next 0/2] Track recursive calls in TC act_mirred
On Tue, Jun 25, 2019 at 12:18 PM Jamal Hadi Salim <jhs@...atatu.com> wrote:
>
> On 2019-06-24 6:13 p.m., John Hurley wrote:
> > These patches aim to prevent act_mirred causing stack overflow events from
> > recursively calling packet xmit or receive functions. Such events can
> > occur with poor TC configuration that causes packets to travel in loops
> > within the system.
> >
> > Florian Westphal advises that a recursion crash and packets looping are
> > separate issues and should be treated as such. David Miller futher points
> > out that pcpu counters cannot track the precise skb context required to
> > detect loops. Hence these patches are not aimed at detecting packet loops,
> > rather, preventing stack flows arising from such loops.
>
> Sigh. So we are still trying to save 2 bits?
> John, you said ovs has introduced a similar loop handling code;
> Is their approach similar to this? Bigger question: Is this approach
> "good enough"?
>
Hi Jamal.
Yes, OvS implements a similar approach to prevent recursion:
https://elixir.bootlin.com/linux/v5.2-rc6/source/net/openvswitch/actions.c#L1530
It was discussed on a previous thread that there are really 2 issues
here (even if it is the same thing that triggers them).
Firstly, the infinite looping of packets caused by poor config and,
secondly, the kernel panic caused by a stack overflow due to the
recursion in use.
These patches target the latter.
I think this approach is good enough to deal with the crashes as it
tracks a packets recursive calls (through act_mirred) on the network
stack.
If the packet is scheduled and releases the CPU then the counter is reset.
The packet may still loop but it will not cause stack overflows.
> Not to put more work for you, but one suggestion is to use skb metadata
> approach which is performance unfriendly (could be argued to more
> correct).
>
> Also: Please consider submitting a test case or two for tdc.
>
> cheers,
> jamal
>
Powered by blists - more mailing lists