netdev - Re: [PATCH net-next 0/5] mlxsw: Add support for RED qevent "mark"

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210121102318.GA2637214@shredder.lan>
Date:   Thu, 21 Jan 2021 12:23:18 +0200
From:   Ido Schimmel <idosch@...sch.org>
To:     Jakub Kicinski <kuba@...nel.org>
Cc:     netdev@...r.kernel.org, davem@...emloft.net, petrm@...dia.com,
        jiri@...dia.com, amcohen@...dia.com, mlxsw@...dia.com,
        Ido Schimmel <idosch@...dia.com>
Subject: Re: [PATCH net-next 0/5] mlxsw: Add support for RED qevent "mark"

On Wed, Jan 20, 2021 at 04:45:08PM -0800, Jakub Kicinski wrote:
> On Wed, 20 Jan 2021 11:14:37 +0200 Ido Schimmel wrote:
> > On Tue, Jan 19, 2021 at 02:22:55PM -0800, Jakub Kicinski wrote:
> > > On Sun, 17 Jan 2021 10:02:18 +0200 Ido Schimmel wrote:  
> > > > From: Ido Schimmel <idosch@...dia.com>
> > > > 
> > > > The RED qdisc currently supports two qevents: "early_drop" and "mark". The
> > > > filters added to the block bound to the "early_drop" qevent are executed on
> > > > packets for which the RED algorithm decides that they should be
> > > > early-dropped. The "mark" filters are similarly executed on ECT packets
> > > > that are marked as ECN-CE (Congestion Encountered).
> > > > 
> > > > A previous patchset has offloaded "early_drop" filters on Spectrum-2 and
> > > > later, provided that the classifier used is "matchall", that the action
> > > > used is either "trap" or "mirred", and a handful or further limitations.  
> > > 
> > > For early_drop trap or mirred makes obvious sense, no explanation
> > > needed.
> > > 
> > > But for marked as a user I'd like to see a _copy_ of the packet, 
> > > while the original continues on its marry way to the destination.
> > > I'd venture to say that e.g. for a DCTCP deployment mark+trap is
> > > unusable, at least for tracing, because it distorts the operation 
> > > by effectively dropping instead of marking.
> > > 
> > > Am I reading this right?  
> > 
> > You get a copy of the packet as otherwise it will create a lot of
> > problems (like you wrote).
> 
> Hm, so am I missing some background on semantics on TC_ACT_TRAP?
> Or perhaps you use a different action code?

Well, to make it really clear, we can add TC_ACT_TRAP_MIRROR.

TC_ACT_TRAP: Sole copy goes to the CPU
TC_ACT_TRAP_MIRROR: The packet is forwarded by the underlying device and
a copy is sent to the CPU

And only allow (in mlxsw) attaching filters with TC_ACT_TRAP_MIRROR to
the "mark" qevent.

> 
> AFAICT the code in the kernel is:
> 
> struct sk_buff *tcf_qevent_handle(...
> 
> 	case TC_ACT_STOLEN:
> 	case TC_ACT_QUEUED:
> 	case TC_ACT_TRAP:
> 		__qdisc_drop(skb, to_free);
> 		*ret = __NET_XMIT_STOLEN;
> 		return NULL;
> 
> Having TRAP mean DROP makes sense for filters, but in case of qevents
> shouldn't they be a no-op?
> 
> Looking at sch_red looks like TRAP being a no-op would actually give us
> the expected behavior.

I'm not sure it makes sense to try to interpret these actions in
software (I expect they will be used with "skip_sw" filters), but
TC_ACT_TRAP_MIRROR can be a no-op like you suggested.

> 
> > > If that is the case and you really want to keep the mark+trap
> > > functionality - I feel like at least better documentation is needed.
> > > The current two liner should also be rewritten, quoting from patch 1:
> > >   
> > > > * - ``ecn_mark``
> > > >   - ``drop``
> > > >   - Traps ECN-capable packets that were marked with CE (Congestion
> > > >     Encountered) code point by RED algorithm instead of being dropped  
> > > 
> > > That needs to say that the trap is for datagrams trapped by a qevent.
> > > Otherwise "Traps ... instead of being dropped" is too much of a
> > > thought-shortcut, marked packets are not dropped.
> > > 
> > > (I'd also think that trap is better documented next to early_drop,
> > > let's look at it from the reader's perspective)  
> > 
> > How about:
> > 
> > "Traps a copy of ECN-capable packets that were marked with CE
> 
> I think "Traps copies" or "Traps the copy of .. packet"?
> I'm not a native speaker but there seems to be a grammatical mix here.
> 
> > (Congestion Encountered) code point by RED algorithm instead of being
> > dropped. The trap is enabled by attaching a filter with action 'trap' to
> 
> ... instead of those copies being dropped.

Will reword

> 
> > the 'mark' qevent of the RED qdisc."
> >
> > In addition, this output:
> > 
> > $ devlink trap show pci/0000:06:00.0 trap ecn_mark 
> > pci/0000:06:00.0:
> >   name ecn_mark type drop generic true action trap group buffer_drops
> > 
> > Can be converted to:
> > 
> > $ devlink trap show pci/0000:06:00.0 trap ecn_mark 
> > pci/0000:06:00.0:
> >   name ecn_mark type drop generic true action mirror group buffer_drops
> > 
> > "mirror: The packet is forwarded by the underlying device and a copy is sent to
> > the CPU."
> > 
> > In this case the action is static and you cannot change it.
> 
> Oh yes, that's nice, I thought mirror in traps means mirror to another
> port. Are there already traps which implement the mirroring / trapping
> a clone? Quick grep yields nothing of substance.

Yes. That's why we have the 'offload_fwd_mark' and 'offload_l3_fwd_mark'
bits in the skb. For example, we let the hardware flood ARP requests
('arp_request'), but also send a copy to the CPU in case it needs to
update its neighbour table. The trapping happens at L2, so we only set
the 'offload_fwd_mark' bit. It will tell the bridge driver to not flood
the packet again.

The 'offload_l3_fwd_mark' bit is mainly used to support one-armed router
use cases where a packet is forwarded through the same interface through
which it was received ('uc_loopback'). We do the forwarding in hardware,
but also send a copy to the CPU to give the kernel the chance to
generate an ICMP redirect if it was not disabled by the user. See more
info in commit 55827458e058 ("Merge branch
'mlxsw-Add-one-armed-router-support'").

I also want to explain how the qevent stuff works in hardware to make
sure it is all clear. We have the ability to bind different triggers to
a mirroring (SPAN) agent. The agent can point to a physical port /
virtual interface (e.g., gretap for ERSPAN) or to the CPU port. The
first is programmed via the mirred action and the second using the trap
action.

The triggers can be simple such as Rx/Tx packet (matchall + mirred) or
policy engine (flower + mirred). The more advanced triggers are various
buffer events such as early drops ('early_drop' qevent) and ECN marking
('mark' qevent). Currently, it is only possible to bind these triggers
to a mirroring agent which is why we only support (in mlxsw) attaching
matchall filters to these qevents. In the future we might be able to
bind ACLs to these triggers in which case we will allow attaching flower
filters. devlink-trap is really only a read-only interface in this case,
meant to tell you why you go the packet from the hardware datapath. The
enablement / disablement is done by tc which gives us feature parity
with the software datapath.