lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAM0EoMmvjwxLmdT5pQJ-hXVMA2OJUfy8TJKDxZ=vf+Thzza0=Q@mail.gmail.com>
Date: Thu, 7 Dec 2023 09:10:13 -0500
From: Jamal Hadi Salim <jhs@...atatu.com>
To: Davide Caratti <dcaratti@...hat.com>
Cc: Jiri Pirko <jiri@...nulli.us>, Xin Long <lucien.xin@...il.com>, 
	Marcelo Ricardo Leitner <marcelo.leitner@...il.com>, 
	Linux Kernel Network Developers <netdev@...r.kernel.org>, Paolo Abeni <pabeni@...hat.com>, wizhao@...hat.com, 
	Cong Wang <xiyou.wangcong@...il.com>, Florian Westphal <fw@...len.de>
Subject: Re: Mirred broken WAS(Re: [PATCH net-next 2/2] act_mirred: use the
 backlog for nested calls to mirred ingress

Hi Davide,


On Tue, Dec 5, 2023 at 10:12 AM Jamal Hadi Salim <jhs@...atatu.com> wrote:
>
> On Tue, Dec 5, 2023 at 5:54 AM Davide Caratti <dcaratti@...hat.com> wrote:
> >
> > hello Jamal, thanks for looking at this!
> >
> > On Mon, Dec 4, 2023 at 9:24 PM Jamal Hadi Salim <jhs@...atatu.com> wrote:
> > >
> > > On Fri, Jan 20, 2023 at 12:02 PM Davide Caratti <dcaratti@...hat.com> wrote:
> > > >
> > > > William reports kernel soft-lockups on some OVS topologies when TC mirred
> > > > egress->ingress action is hit by local TCP traffic [1].
> > > > The same can also be reproduced with SCTP (thanks Xin for verifying), when
> > > > client and server reach themselves through mirred egress to ingress, and
> > > > one of the two peers sends a "heartbeat" packet (from within a timer).
> >
> > [...]
> >
> > > I am afraid this broke things. Here's a simple use case which causes
> > > an infinite loop (that we found while testing blockcasting but
> > > simplified to demonstrate the issue):
> >
> > [...]
> >
> > > sudo ip netns exec p4node tc qdisc add dev port0 clsact
> > > sudo ip netns exec p4node tc filter add dev port0 ingress protocol ip
> > > prio 10 matchall action mirred ingress redirect dev port0
> >
> > the above rule is taking packets from port0 ingress and putting it
> > again in the mirred ingress of the same device, hence the loop.
>
> Right - that was intentional to show the loop. We are worrying about
> extending mirred now to also broadcast (see the blockcast discussion)
> to more ports making the loop even worse. The loop should terminate at
> some point - in this case it does not...
>
> > I don't see it much different than what we can obtain with bridges:
> >
> > # ip link add name one type veth peer name two
> > # ip link add name three type veth peer name four
> > # for n in even odd; do ip link add name $n type bridge; done
> > # for n in one two three four even odd; do ip link set dev $n up; done
> > # for n in one three; do ip link set dev $n master odd; done
> > # for n in two four; do ip link set dev $n master even; done
> >
>
> Sure that is another way to reproduce.

Ok, so i can verify that re-introduction of the ttl field in the
skb[1] fixes the issue. But restoring that patch may cause too much
bikeshedding. Victor will work on a better approach using the cb
struct instead - there may. Are you able to test with/out your patch
and see if this same patch fixes it?

cheers,
jamal

[1]https://lore.kernel.org/all/1430765318-13788-1-git-send-email-fw@strlen.de/

> > there is a practical difference: with bridges we have protocols (like
> > STP) that can detect and act-upon loops - while TC mirred needs some
> > facility on top (not 100% sure, but the same might also apply to
> > similar tools, such as users of bpf_redirect() helper)
> >
>
> I dont think we can run something equivalent inside the kernel. The
> ttl worked fine. BTW, the example shown breaks even when you have
> everything running on a single cpu (and packets being queued on the
> backlog)
>
> > > reverting the patch fixes things and it gets caught by the nested
> > > recursion check.
> >
> > the price of that revert is: we'll see those soft-lockups again with
> > L4 protocols when peers communicate through mirred egress -> ingress.
> >
> > And even if it would fix mirred egress->ingress loops, we would still
> > suffer from soft-lockups (on the qdisc root lock) when the same rule
> > is done with mirred egress (see an example at
> > https://github.com/multipath-tcp/mptcp_net-next/issues/451#issuecomment-1782690200)
> > [1]
>
> Yes, we need to make sure those are fixed with whatever replacement..
> The loops will happen even on egress->egress (the example only showed
> ingress-ingress).
>
> We will try restoring the ttl and see if it continues to work with
> your patch intact... unless there are other ideas.
>
> > > Frankly, I believe we should restore a proper ttl from what was removed here:
> > > https://lore.kernel.org/all/1430765318-13788-1-git-send-email-fw@strlen.de/
> > > The headaches(and time consumed) trying to save the 3-4 bits removing
> > > the ttl field is not worth it imo.
> >
> > TTL would protect us against loops when they are on the same node:
> > what do you think about inserting a rule that detects BPDU before the
> > mirred ingress rule?
>
> I dont think running STP will save us from this, unless i am mistaken.
> This happens within the kernel before the packet hits the "wire".
> Besides this happens without using any bridging.
>
> > [1] by the way: the POC patch at
> > https://github.com/multipath-tcp/mptcp_net-next/issues/451#issuecomment-1782654075
> > silcences lockdep false warnings, and it preserves the splat when the
> > real deadlock happens with TC marred egress. If you agree I will send
> > it soon to this ML for review.
>
> please do.
>
> cheers,
> jamal

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ