lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 7 Jul 2023 19:58:21 +0200
From: Larysa Zaremba <larysa.zaremba@...el.com>
To: Jesper Dangaard Brouer <jbrouer@...hat.com>
CC: <brouer@...hat.com>, John Fastabend <john.fastabend@...il.com>,
	<bpf@...r.kernel.org>, <ast@...nel.org>, <daniel@...earbox.net>,
	<andrii@...nel.org>, <martin.lau@...ux.dev>, <song@...nel.org>, <yhs@...com>,
	<kpsingh@...nel.org>, <sdf@...gle.com>, <haoluo@...gle.com>,
	<jolsa@...nel.org>, David Ahern <dsahern@...il.com>, Jakub Kicinski
	<kuba@...nel.org>, Willem de Bruijn <willemb@...gle.com>, Anatoly Burakov
	<anatoly.burakov@...el.com>, Alexander Lobakin <alexandr.lobakin@...el.com>,
	Magnus Karlsson <magnus.karlsson@...il.com>, Maryam Tahhan
	<mtahhan@...hat.com>, <xdp-hints@...-project.net>, <netdev@...r.kernel.org>,
	Andrew Lunn <andrew@...n.ch>
Subject: Re: [PATCH bpf-next v2 09/20] xdp: Add VLAN tag hint

On Fri, Jul 07, 2023 at 03:57:13PM +0200, Jesper Dangaard Brouer wrote:
> 
> 
> On 06/07/2023 16.46, Larysa Zaremba wrote:
> > On Tue, Jul 04, 2023 at 04:18:04PM +0200, Jesper Dangaard Brouer wrote:
> > > 
> > > 
> > > On 04/07/2023 13.02, Larysa Zaremba wrote:
> > > > On Tue, Jul 04, 2023 at 12:23:45PM +0200, Jesper Dangaard Brouer wrote:
> > > > > 
> > > > > On 04/07/2023 10.23, Larysa Zaremba wrote:
> > > > > > On Mon, Jul 03, 2023 at 01:15:34PM -0700, John Fastabend wrote:
> > > > > > > Larysa Zaremba wrote:
> > > > > > > > Implement functionality that enables drivers to expose VLAN tag
> > > > > > > > to XDP code.
> > > > > > > > 
> > > > > > > > Signed-off-by: Larysa Zaremba <larysa.zaremba@...el.com>
> > > > > > > > ---
> > > > > > > >     Documentation/networking/xdp-rx-metadata.rst |  8 +++++++-
> > > > > > > >     include/linux/netdevice.h                    |  2 ++
> > > > > > > >     include/net/xdp.h                            |  2 ++
> > > > > > > >     kernel/bpf/offload.c                         |  2 ++
> > > > > > > >     net/core/xdp.c                               | 20 ++++++++++++++++++++
> > > > > > > >     5 files changed, 33 insertions(+), 1 deletion(-)
> > > > > > > > 
> > > > > > > > diff --git a/Documentation/networking/xdp-rx-metadata.rst b/Documentation/networking/xdp-rx-metadata.rst
> > > > > > > > index 25ce72af81c2..ea6dd79a21d3 100644
> > > > > > > > --- a/Documentation/networking/xdp-rx-metadata.rst
> > > > > > > > +++ b/Documentation/networking/xdp-rx-metadata.rst
> > > > > > > > @@ -18,7 +18,13 @@ Currently, the following kfuncs are supported. In the future, as more
> > > > > > > >     metadata is supported, this set will grow:
> > > > > > > >     .. kernel-doc:: net/core/xdp.c
> > > > > > > > -   :identifiers: bpf_xdp_metadata_rx_timestamp bpf_xdp_metadata_rx_hash
> > > > > > > > +   :identifiers: bpf_xdp_metadata_rx_timestamp
> > > > > > > > +
> > > > > > > > +.. kernel-doc:: net/core/xdp.c
> > > > > > > > +   :identifiers: bpf_xdp_metadata_rx_hash
> > > > > > > > +
> > > > > > > > +.. kernel-doc:: net/core/xdp.c
> > > > > > > > +   :identifiers: bpf_xdp_metadata_rx_vlan_tag
> > > > > > > >     An XDP program can use these kfuncs to read the metadata into stack
> > > > > > > >     variables for its own consumption. Or, to pass the metadata on to other
> > > > > [...]
> > > > > > > > diff --git a/net/core/xdp.c b/net/core/xdp.c
> > > > > > > > index 41e5ca8643ec..f6262c90e45f 100644
> > > > > > > > --- a/net/core/xdp.c
> > > > > > > > +++ b/net/core/xdp.c
> > > > > > > > @@ -738,6 +738,26 @@ __bpf_kfunc int bpf_xdp_metadata_rx_hash(const struct xdp_md *ctx, u32 *hash,
> > > > > > > >     	return -EOPNOTSUPP;
> > > > > > > >     }
> > > > > > > > +/**
> > > > > > > > + * bpf_xdp_metadata_rx_vlan_tag - Get XDP packet outermost VLAN tag with protocol
> > > > > > > > + * @ctx: XDP context pointer.
> > > > > > > > + * @vlan_tag: Destination pointer for VLAN tag
> > > > > > > > + * @vlan_proto: Destination pointer for VLAN protocol identifier in network byte order.
> > > > > > > > + *
> > > > > > > > + * In case of success, vlan_tag contains VLAN tag, including 12 least significant bytes
> > > > > > > > + * containing VLAN ID, vlan_proto contains protocol identifier.
> > > > > > > 
> > > > > > > Above is a bit confusing to me at least.
> > > > > > > 
> > > > > > > The vlan tag would be both the 16bit TPID and 16bit TCI. What fields
> > > > > > > are to be included here? The VlanID or the full 16bit TCI meaning the
> > > > > > > PCP+DEI+VID?
> > > > > > 
> > > > > > It contains PCP+DEI+VID, in patch 16 ("selftests/bpf: Add flags and new hints to
> > > > > > xdp_hw_metadata") this is more clear, because the tag is parsed.
> > > > > > 
> > > > > 
> > > > > Do we really care about the "EtherType" proto (in VLAN speak TPID = Tag
> > > > > Protocol IDentifier)?
> > > > > I mean, it can basically only have two values[1], and we just wanted to
> > > > > know if it is a VLAN (that hardware offloaded/removed for us):
> > > > 
> > > > If we assume everyone follows the standard, this would be correct.
> > > > But apparently, some applications use some ambiguous value as a TPID [0].
> > > > 
> > > > So it is not hard to imagine, some NICs could alllow you to configure your
> > > > custom TPID. I am not sure if any in-tree drivers actually do this, but I think
> > > > it's nice to provide some flexibility on XDP level, especially considering
> > > > network stack stores full vlan_proto.
> > > > 
> > > 
> > > I'm buying your argument, and agree it makes sense to provide TPID in
> > > the call signature.  Given weird hardware exists that allow people to
> > > configure custom TPID.
> > > 
> > > Looking through kernel defines (in uapi/linux/if_ether.h) I see evidence
> > > that funky QinQ EtherTypes have been used in the past:
> > > 
> > >   #define ETH_P_QINQ1	0x9100		/* deprecated QinQ VLAN [ NOT AN OFFICIALLY
> > > REGISTERED ID ] */
> > >   #define ETH_P_QINQ2	0x9200		/* deprecated QinQ VLAN [ NOT AN OFFICIALLY
> > > REGISTERED ID ] */
> > >   #define ETH_P_QINQ3	0x9300		/* deprecated QinQ VLAN [ NOT AN OFFICIALLY
> > > REGISTERED ID ] */
> > > 
> > > 
> > > > [0]
> > > > https://techhub.hpe.com/eginfolib/networking/docs/switches/7500/5200-1938a_l2-lan_cg/content/495503472.htm
> > > > 
> > > > > 
> > > > >    static __always_inline int proto_is_vlan(__u16 h_proto)
> > > > >    {
> > > > > 	return !!(h_proto == bpf_htons(ETH_P_8021Q) ||
> > > > > 		  h_proto == bpf_htons(ETH_P_8021AD));
> > > > >    }
> > > > > 
> > > > > [1] https://github.com/xdp-project/bpf-examples/blob/master/include/xdp/parsing_helpers.h#L75-L79
> > > > > 
> > > > > Cc. Andrew Lunn, as I notice DSA have a fake VLAN define ETH_P_DSA_8021Q
> > > > > (in file include/uapi/linux/if_ether.h)
> > > > > Is this actually in use?
> > > > > Maybe some hardware can "VLAN" offload this?
> > > > > 
> > > > > 
> > > > > > What about rephrasing it this way:
> > > > > > 
> > > > > > In case of success, vlan_proto contains VLAN protocol identifier (TPID),
> > > > > > vlan_tag contains the remaining 16 bits of a 802.1Q tag (PCP+DEI+VID).
> > > > > > 
> > > > > 
> > > > > Hmm, I think we can improve this further. This text becomes part of the
> > > > > documentation for end-users (target audience).  Thus, I think it is
> > > > > worth being more verbose and even mention the existing defines that we
> > > > > are expecting end-users to take advantage of.
> > > > > 
> > > > > What about:
> > > > > 
> > > > > In case of success. The VLAN EtherType is stored in vlan_proto (usually
> > > > > either ETH_P_8021Q or ETH_P_8021AD) also known as TPID (Tag Protocol
> > > > > IDentifier). The VLAN tag is stored in vlan_tag, which is a 16-bit field
> > > > > containing sub-fields (PCP+DEI+VID). The VLAN ID (VID) is 12-bits
> > > > > commonly extracted using mask VLAN_VID_MASK (0x0fff).  For the meaning
> > > > > of the sub-fields Priority Code Point (PCP) and Drop Eligible Indicator
> > > > > (DEI) (formerly CFI) please reference other documentation. Remember
> > > > > these 16-bit fields are stored in network-byte. Thus, transformation
> > > > > with byte-order helper functions like bpf_ntohs() are needed.
> > > > > 
> > > > 
> > > > AFAIK, vlan_tag is stored in host byte order, this is how it is in skb.
> > > 
> > > I'm not sure we should follow SKB storage scheme for XDP.
> > > 
> > 
> > I think following SKB convention is a good idea in this particular case. As I
> > have mentioned below, in ice VLAN TCI in descriptor already comes in LE, so no
> > point in converting it into BE, so somebody would use bpf_ntohs() later anyway.
> > We are not the only manufacturer that does this.
> > 
> 
> As long as other NIC hardware does the same this seems okay.
> 
> 
> > > > In ice, we receive VLAN tag in descriptor already in LE.
> > > > Only protocol is BE (network byte order). So I would replace the last 2
> > > > sentences with the following:
> > > > 
> > > > vlan_tag is stored in host byte order, so no byte order conversion is needed.
> > > 
> > > Yikes, that was unexpected.  This needs to be heavily documented in docs.
> > 
> > You mean the motivation, why it is so and not the other way around?
> > 
> 
> No, I don't mean the motivation.
> I simply mean write it in *bold*.
> 
> Look at the description for bpf_xdp_metadata_rx_hash, how it gets
> rendered [1] and how the code comments look [2].
> 
>  [1] https://kernel.org/doc/html/latest/networking/xdp-rx-metadata.html#general-design
>  [2] https://elixir.bootlin.com/linux/v6.4/source/net/core/xdp.c#L724
> 
> To save you some time compiling htmldocs target:
> 
>  make SPHINXDIRS="networking" V=1  htmldocs
> 

Ok, will do :)

> > > 
> > > When parsing packets, it is in network-byte-order, else my code is wrong
> > > here[1]:
> > > 
> > >    [1] https://github.com/xdp-project/bpf-examples/blob/master/include/xdp/parsing_helpers.h#L122
> > > 
> > > I'm accessing the skb->vlan_tci here [2], and I notice I don't do any
> > > byte-order conversions, so fortunately I didn't make a code mistake.
> > > 
> > >    [2] https://github.com/xdp-project/bpf-examples/blob/master/traffic-pacing-edt/edt_pacer_vlan.c#L215
> > > 
> > 
> > In raw packet, VLAN TCI is in network byte order, but skb requires NIC/driver
> > to convert it into host byte order before putting it into skb.
> > 
> 
> I'm interested in if *most* NIC hardware will deliver this in LE
> (Little-Endian) which is host-byte order on x86 ?
>

At least intel, pensando and some broadcom products get VLAN TCI in LE.
Mellanox gets in BE.

> 
> > > > vlan_proto is stored in network byte order, the suggested way to use this value:
> > > > 
> > > > vlan_proto == bpf_htons(ETH_P_8021Q)
> > > > 
> > > > > 
> > > > > 
> > > 
> > > --Jesper
> > > 
> > 
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ