[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <13f973a9937834ae8c10bfcc7d90909e94c543f1.camel@mellanox.com>
Date: Wed, 4 Jul 2018 00:57:47 +0000
From: Saeed Mahameed <saeedm@...lanox.com>
To: "alexei.starovoitov@...il.com" <alexei.starovoitov@...il.com>,
"saeedm@....mellanox.co.il" <saeedm@....mellanox.co.il>
CC: "alexander.h.duyck@...el.com" <alexander.h.duyck@...el.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
Tariq Toukan <tariqt@...lanox.com>,
"john.fastabend@...il.com" <john.fastabend@...il.com>,
"brouer@...hat.com" <brouer@...hat.com>,
"borkmann@...earbox.net" <borkmann@...earbox.net>,
"peter.waskiewicz.jr@...el.com" <peter.waskiewicz.jr@...el.com>
Subject: Re: [RFC bpf-next 2/6] net: xdp: RX meta data infrastructure
On Tue, 2018-07-03 at 16:01 -0700, Alexei Starovoitov wrote:
> On Tue, Jun 26, 2018 at 07:46:11PM -0700, Saeed Mahameed wrote:
> > The idea from this patch is to define a well known structure for
> > XDP meta
> > data fields format and offset placement inside the xdp data meta
> > buffer.
> >
> > For that driver will need some static information to know what to
> > provide and where, enters struct xdp_md_info and xdp_md_info_arr:
> >
> > struct xdp_md_info {
> > __u16 present:1;
> > __u16 offset:15; /* offset from data_meta in xdp_md buffer
> > */
> > };
> >
> > /* XDP meta data offsets info array
> > * present bit describes if a meta data is or will be present in
> > xdp_md buff
> > * offset describes where a meta data is or should be placed in
> > xdp_md buff
> > *
> > * Kernel builds this array using xdp_md_info_build helper on
> > demand.
> > * User space builds it statically in the xdp program.
> > */
> > typedef struct xdp_md_info xdp_md_info_arr[XDP_DATA_META_MAX];
> >
> > Offsets in xdp_md_info_arr are always in ascending order and only
> > for
> > requested meta data:
> > example : flags = XDP_FLAGS_META_HASH | XDP_FLAGS_META_VLAN;
> >
> > xdp_md_info_arr mdi = {
> > [XDP_DATA_META_HASH] = {.offset = 0, .present = 1},
> > [XDP_DATA_META_MARK] = {.offset = 0, .present = 0},
> > [XDP_DATA_META_VLAN] = {.offset = sizeof(struct xdp_md_hash),
> > .present = 1},
> > [XDP_DATA_META_CSUM] = {.offset = 0, .present = 0},
> > }
> >
> > i.e: hash fields will always appear first then the vlan for every
> > xdp_md.
> >
> > Once requested to provide xdp meta data, device driver will use a
> > pre-built
> > xdp_md_info_arr which was built via xdp_md_info_build on xdp setup,
> > xdp_md_info_arr will tell the driver what is the offset of each
> > meta data.
> >
> > This patch also provides helper functions for the device drivers to
> > store
> > meta data into xdp_buff, and helper function for XDP programs to
> > fetch
> > specific xdp meta data from xdp_md buffer.
> >
> > Signed-off-by: Saeed Mahameed <saeedm@...lanox.com>
>
> ...
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index 59b19b6a40d7..e320e7421565 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -2353,6 +2353,120 @@ struct xdp_md {
> > __u32 rx_queue_index; /* rxq->queue_index */
> > };
> >
> > +enum {
> > + XDP_DATA_META_HASH = 0,
> > + XDP_DATA_META_MARK,
> > + XDP_DATA_META_VLAN,
> > + XDP_DATA_META_CSUM_COMPLETE,
> > +
> > + XDP_DATA_META_MAX,
> > +};
> > +
> > +struct xdp_md_hash {
> > + __u32 hash;
> > + __u8 type;
> > +} __attribute__((aligned(4)));
> > +
> > +struct xdp_md_mark {
> > + __u32 mark;
> > +} __attribute__((aligned(4)));
> > +
> > +struct xdp_md_vlan {
> > + __u16 vlan;
> > +} __attribute__((aligned(4)));
> > +
> > +struct xdp_md_csum {
> > + __u16 csum;
> > +} __attribute__((aligned(4)));
> > +
> > +static const __u16 xdp_md_size[XDP_DATA_META_MAX] = {
> > + sizeof(struct xdp_md_hash), /* XDP_DATA_META_HASH */
> > + sizeof(struct xdp_md_mark), /* XDP_DATA_META_MARK */
> > + sizeof(struct xdp_md_vlan), /* XDP_DATA_META_VLAN */
> > + sizeof(struct xdp_md_csum), /* XDP_DATA_META_CSUM_COMPLETE
> > */
> > +};
> > +
> > +struct xdp_md_info {
> > + __u16 present:1;
> > + __u16 offset:15; /* offset from data_meta in xdp_md buffer
> > */
> > +};
> > +
> > +/* XDP meta data offsets info array
> > + * present bit describes if a meta data is or will be present in
> > xdp_md buff
> > + * offset describes where a meta data is or should be placed in
> > xdp_md buff
> > + *
> > + * Kernel builds this array using xdp_md_info_build helper on
> > demand.
> > + * User space builds it statically in the xdp program.
> > + */
> > +typedef struct xdp_md_info xdp_md_info_arr[XDP_DATA_META_MAX];
> > +
> > +static __always_inline __u8
> > +xdp_data_meta_present(xdp_md_info_arr mdi, int meta_data)
> > +{
> > + return mdi[meta_data].present;
> > +}
> > +
> > +static __always_inline void
> > +xdp_data_meta_set_hash(xdp_md_info_arr mdi, void *data_meta, __u32
> > hash, __u8 htype)
> > +{
> > + struct xdp_md_hash *hash_md;
> > +
> > + hash_md = (struct xdp_md_hash *)((char*)data_meta +
> > mdi[XDP_DATA_META_HASH].offset);
> > + hash_md->hash = hash;
> > + hash_md->type = htype;
> > +}
> > +
> > +static __always_inline struct xdp_md_hash *
> > +xdp_data_meta_get_hash(xdp_md_info_arr mdi, void *data_meta)
> > +{
> > + return (struct xdp_md_hash *)((char*)data_meta +
> > mdi[XDP_DATA_META_HASH].offset);
> > +}
>
> I'm afraid this is not scalable uapi.
> This looks very much mlx specific. Every NIC vendor cannot add its
> own
> struct definitions into uapi/bpf.h. It doesn't scale.
No, this is not mlx specific, all you can find here is standard
hash/csum/vlan/flow mark, fields as described in SKB, we want them to
be well defined so you can construct SKBs with these standard values
from xdp_md with no surprises.
I agree this is not scalable, but this patch is not about arbitrary
vendor specific hints, with unknown sizes and shapes, i understand the
demand but we do need the notion of standard hints as well in order to
allow (driver/fw/hw)=>(xdp program)=>(stack) data flow, since the three
of those entities must agree on some well defined hints.
Anyway we discussed this in the last iovisor meeting and we agreed that
along with the standard hints suggested by this patch we want to allow
arbitrary hw specific hints, the question is how to have both.
> Also doing 15 bit bitfield extraction using bpf instructions is not
> that simple.
> mlx should consider doing plain u8 or u16 fields in firmware instead.
> The metadata that "hw" provides is a joint work of actual asic and
> firmware.
Right, this would be a huge flexibility improvement for future hw.
what about current HW, the driver can always translate the current
format to whatever format we decide on.
> I suspect the format can and will change with different firmware.
> Baking this stuff into uapi/bpf.h is not an option.
Yes but we still need to have a well defined structures for standard
hints, for the hints to flow into the stack as well. current and future
HW/FW/driver must respect the well known format of specific hints, such
as hash/csum/vlan, etc..
> How about we make driver+firmware provide a BTF definition of
> metadata that they
> can provide? There can be multiple definitions of such structs.
> Then in userpsace we can have BTF->plain C converter.
> (bpftool practically ready to do that already).
> Then the programmer can take such generated C definition, add it to
> .h and include
> it in their programs. llvm will compile the whole thing and will
> include BTF
> of maps, progs and this md struct in the target elf file.
> During loading the kernel can check that BTF in elf is matching one-
> to-one
> to what driver+firmware are saying they support.
Just thinking out loud, can't we do this at program load ? just run a
setup function in the xdp program to load nic md BTF definition into
the elf section ?
> No ambiguity and no possibility of mistake, since offsets and field
> names
> are verified.
But what about the dynamic nature of this feature ? Sometimes you only
want HW/Driver to provide a subset of whatever the HW can provide and
save md buffer for other stuff.
Yes a well defined format is favorable here, but we need to make sure
there is no computational overhead in data path just to extract each
field! for example if i want to know what is the offset of the hash
will i need to go parse (for every packet) the whole BTF definition of
metadata just to find the offset of type=hash ?
> Every driver can have their own BTF for md and their own special
> features.
> We can try to standardize the names (like vlan and csum), so xdp
> programs
> can stay relatively portable across NICs.
Yes this is a must.
> Such api will address exposing asic+firmware metadata to the xdp
> program.
> Once we tackle this problem, we'll think how to do the backward
> config
> (to do firmware reconfig for specific BTF definition of md supplied
> by the prog).
> What people think?
>
For legacy HW, we can do it already in the driver, provide whatever the
prog requested, its only a matter of translation to the BTF format in
the driver xdp setup and pushing the values accordingly into the md
offsets on data path.
Question: how can you share the md BTF from the driver/HW with the xdp
program ?
Powered by blists - more mailing lists