lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 5 Dec 2023 10:12:45 -0500
From: Jamal Hadi Salim <jhs@...atatu.com>
To: Davide Caratti <dcaratti@...hat.com>
Cc: Jiri Pirko <jiri@...nulli.us>, Xin Long <lucien.xin@...il.com>, 
	Marcelo Ricardo Leitner <marcelo.leitner@...il.com>, 
	Linux Kernel Network Developers <netdev@...r.kernel.org>, Paolo Abeni <pabeni@...hat.com>, wizhao@...hat.com, 
	Cong Wang <xiyou.wangcong@...il.com>, Florian Westphal <fw@...len.de>
Subject: Re: Mirred broken WAS(Re: [PATCH net-next 2/2] act_mirred: use the
 backlog for nested calls to mirred ingress

On Tue, Dec 5, 2023 at 5:54 AM Davide Caratti <dcaratti@...hat.com> wrote:
>
> hello Jamal, thanks for looking at this!
>
> On Mon, Dec 4, 2023 at 9:24 PM Jamal Hadi Salim <jhs@...atatu.com> wrote:
> >
> > On Fri, Jan 20, 2023 at 12:02 PM Davide Caratti <dcaratti@...hat.com> wrote:
> > >
> > > William reports kernel soft-lockups on some OVS topologies when TC mirred
> > > egress->ingress action is hit by local TCP traffic [1].
> > > The same can also be reproduced with SCTP (thanks Xin for verifying), when
> > > client and server reach themselves through mirred egress to ingress, and
> > > one of the two peers sends a "heartbeat" packet (from within a timer).
>
> [...]
>
> > I am afraid this broke things. Here's a simple use case which causes
> > an infinite loop (that we found while testing blockcasting but
> > simplified to demonstrate the issue):
>
> [...]
>
> > sudo ip netns exec p4node tc qdisc add dev port0 clsact
> > sudo ip netns exec p4node tc filter add dev port0 ingress protocol ip
> > prio 10 matchall action mirred ingress redirect dev port0
>
> the above rule is taking packets from port0 ingress and putting it
> again in the mirred ingress of the same device, hence the loop.

Right - that was intentional to show the loop. We are worrying about
extending mirred now to also broadcast (see the blockcast discussion)
to more ports making the loop even worse. The loop should terminate at
some point - in this case it does not...

> I don't see it much different than what we can obtain with bridges:
>
> # ip link add name one type veth peer name two
> # ip link add name three type veth peer name four
> # for n in even odd; do ip link add name $n type bridge; done
> # for n in one two three four even odd; do ip link set dev $n up; done
> # for n in one three; do ip link set dev $n master odd; done
> # for n in two four; do ip link set dev $n master even; done
>

Sure that is another way to reproduce.

> there is a practical difference: with bridges we have protocols (like
> STP) that can detect and act-upon loops - while TC mirred needs some
> facility on top (not 100% sure, but the same might also apply to
> similar tools, such as users of bpf_redirect() helper)
>

I dont think we can run something equivalent inside the kernel. The
ttl worked fine. BTW, the example shown breaks even when you have
everything running on a single cpu (and packets being queued on the
backlog)

> > reverting the patch fixes things and it gets caught by the nested
> > recursion check.
>
> the price of that revert is: we'll see those soft-lockups again with
> L4 protocols when peers communicate through mirred egress -> ingress.
>
> And even if it would fix mirred egress->ingress loops, we would still
> suffer from soft-lockups (on the qdisc root lock) when the same rule
> is done with mirred egress (see an example at
> https://github.com/multipath-tcp/mptcp_net-next/issues/451#issuecomment-1782690200)
> [1]

Yes, we need to make sure those are fixed with whatever replacement..
The loops will happen even on egress->egress (the example only showed
ingress-ingress).

We will try restoring the ttl and see if it continues to work with
your patch intact... unless there are other ideas.

> > Frankly, I believe we should restore a proper ttl from what was removed here:
> > https://lore.kernel.org/all/1430765318-13788-1-git-send-email-fw@strlen.de/
> > The headaches(and time consumed) trying to save the 3-4 bits removing
> > the ttl field is not worth it imo.
>
> TTL would protect us against loops when they are on the same node:
> what do you think about inserting a rule that detects BPDU before the
> mirred ingress rule?

I dont think running STP will save us from this, unless i am mistaken.
This happens within the kernel before the packet hits the "wire".
Besides this happens without using any bridging.

> [1] by the way: the POC patch at
> https://github.com/multipath-tcp/mptcp_net-next/issues/451#issuecomment-1782654075
> silcences lockdep false warnings, and it preserves the splat when the
> real deadlock happens with TC marred egress. If you agree I will send
> it soon to this ML for review.

please do.

cheers,
jamal

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ