[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210114215052.GB2676@horizon.localdomain>
Date: Thu, 14 Jan 2021 18:50:52 -0300
From: Marcelo Ricardo Leitner <marcelo.leitner@...il.com>
To: Oz Shlomo <ozsh@...dia.com>
Cc: Roi Dayan <roid@...dia.com>, Saeed Mahameed <saeed@...nel.org>,
"David S. Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>, netdev@...r.kernel.org,
Paul Blakey <paulb@...dia.com>,
Saeed Mahameed <saeedm@...dia.com>
Subject: Re: [net-next 08/15] net/mlx5e: CT: Preparation for offloading
+trk+new ct rules
On Thu, Jan 14, 2021 at 04:03:43PM +0200, Oz Shlomo wrote:
>
>
> On 1/14/2021 3:02 PM, Marcelo Ricardo Leitner wrote:
> > On Tue, Jan 12, 2021 at 11:27:04AM +0200, Oz Shlomo wrote:
> > >
> > >
> > > On 1/12/2021 1:51 AM, Marcelo Ricardo Leitner wrote:
> > > > On Sun, Jan 10, 2021 at 09:52:55AM +0200, Roi Dayan wrote:
> > > > >
> > > > >
> > > > > On 2021-01-10 9:45 AM, Roi Dayan wrote:
> > > > > >
> > > > > >
> > > > > > On 2021-01-08 11:48 PM, Marcelo Ricardo Leitner wrote:
> > > > > > > Hi,
> > > > > > >
> > > > > > > On Thu, Jan 07, 2021 at 09:30:47PM -0800, Saeed Mahameed wrote:
> > > > > > > > From: Roi Dayan <roid@...dia.com>
> > > > > > > >
> > > > > > > > Connection tracking associates the connection state per packet. The
> > > > > > > > first packet of a connection is assigned with the +trk+new state. The
> > > > > > > > connection enters the established state once a packet is seen on the
> > > > > > > > other direction.
> > > > > > > >
> > > > > > > > Currently we offload only the established flows. However, UDP traffic
> > > > > > > > using source port entropy (e.g. vxlan, RoCE) will never enter the
> > > > > > > > established state. Such protocols do not require stateful processing,
> > > > > > > > and therefore could be offloaded.
> > > > > > >
> > > > > > > If it doesn't require stateful processing, please enlight me on why
> > > > > > > conntrack is being used in the first place. What's the use case here?
> > > > > > >
> > > > > >
> > > > > > The use case for example is when we have vxlan traffic but we do
> > > > > > conntrack on the inner packet (rules on the physical port) so
> > > > > > we never get established but on miss we can still offload as normal
> > > > > > vxlan traffic.
> > > > > >
> > > > >
> > > > > my mistake about "inner packet". we do CT on the underlay network, i.e.
> > > > > the outer header.
> > > >
> > > > I miss why the CT match is being used there then. Isn't it a config
> > > > issue/waste of resources? What is CT adding to the matches/actions
> > > > being done on these flows?
> > > >
> > >
> > > Consider a use case where the network port receives both east-west
> > > encapsulated traffic and north-south non-encapsulated traffic that requires
> > > NAT.
> > >
> > > One possible configuration is to first apply the CT-NAT action.
> > > Established north-south connections will successfully execute the nat action
> > > and will set the +est ct state.
> > > However, the +new state may apply either for valid east-west traffic (e.g.
> > > vxlan) due to source port entropy, or to insecure north-south traffic that
> > > the fw should block. The user may distinguish between the two cases, for
> > > example, by matching on the dest udp port.
> >
> > Sorry but I still don't see the big picture. :-]
> >
> > What do you consider as east-west and north-south traffic? My initial
> > understanding of east-west is traffic between VFs and north-south
> > would be in and out to the wire. You mentioned that north-south is
> > insecure, it would match, but then, non-encapsulated?
> >
> > So it seems you referred to the datacenter. East-west is traffic
> > between hosts on the same datacenter, and north-south is traffic that
> > goes out of it. This seems to match.
>
> Right.
>
> >
> > Assuming it's the latter, then it seems that the idea is to work
> > around a config simplification that was done by the user. As
> > mentioned on the changelog, such protocols do not require stateful
> > processing, and AFAICU this patch twists conntrack so that the user
> > can have simplified rules. Why can't the user have specific rules for
> > the tunnels, and other for dealing with north-south traffic? The fw
> > would still be able to block unwanted traffic.
>
> We cannot control what the user is doing.
Right, but we can educate and point them towards better configs. With
non-optimal configs it's fair to expect non-optimal effects.
> This is a valid tc configuration and would work using tc software datapath.
> However, in such configurations vxlan packets would not be processed in
> hardware because they are marked as new connections.
Makes sense.
>
> >
> > My main problems with this is this, that it is making conntrack do
> > stuff that the user may not be expecting it to do, and that packets
> > may get matched (maybe even unintentionally) and the system won't have
> > visibility on them. Maybe I'm just missing something?
> >
>
> This is why we restricted this feature to udp protocols that will never
> enter established state due to source port entropy.
> Do you see a problematic use case that can arise?
For use case, the only one I see is if someone wants to use this
feature for another application/dstport. It's hardcoded to tunnels
ones.
It feels that the problem is not being solved at the right place. It
will work well for hardware processing, while for software it will
work while having a ton of conntrack entries. Different behaviors that
can lead to people wasting time. Like, trying to debug on why srcport
is not getting randomized when offloaded, while in fact they are, it's
just masked.
As this is a fallback (iow, search is done in 2 levels at least), I
wonder what other approaches were considered. I'm thinking two for
now. One is to add a flag to conntrack entries that allow them to be
this generic. Finding the right conntrack entry probably gets harder,
but when the user dumps /proc/net/nf_conntrack, it says something. On
how/when to add this flag, maybe act_ct can do it if dstport matches
something and/or a sysctl specifying a port list.
The other one may sound an overkill, but is to work with conntrack
expectations somehow.
The first one is closer to the current proposal. It basically makes
the port list configurable and move the "do it" decision to outside
the driver, where the admin can have more control. If conntrack itself
can also leverage it and avoid having tons of entries, even better, as
then we have both behaviors in sync.
Thoughts?
>
> > >
> > >
> > > > >
> > > > > > > >
> > > > > > > > The change in the model is that a miss on the CT table will be forwarded
> > > > > > > > to a new +trk+new ct table and a miss there will be forwarded to
> > > > > > > > the slow
> > > > > > > > path table.
> > > > > > >
> > > > > > > AFAICU this new +trk+new ct table is a wildcard match on sport with
> > > > > > > specific dports. Also AFAICU, such entries will not be visible to the
> > > > > > > userspace then. Is this right?
> > > > > > >
> > > > > > > Marcelo
> > > > > > >
> > > > > >
> > > > > > right.
> > > >
> > > > Thanks,
> > > > Marcelo
> > > >
> > >
>
Powered by blists - more mailing lists