lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <871qptuyie.fsf@toke.dk>
Date:   Wed, 23 Nov 2022 22:55:21 +0100
From:   Toke Høiland-Jørgensen <toke@...hat.com>
To:     sdf@...gle.com, Jakub Kicinski <kuba@...nel.org>
Cc:     bpf@...r.kernel.org, ast@...nel.org, daniel@...earbox.net,
        andrii@...nel.org, martin.lau@...ux.dev, song@...nel.org,
        yhs@...com, john.fastabend@...il.com, kpsingh@...nel.org,
        haoluo@...gle.com, jolsa@...nel.org,
        Tariq Toukan <tariqt@...dia.com>,
        David Ahern <dsahern@...il.com>,
        Willem de Bruijn <willemb@...gle.com>,
        Jesper Dangaard Brouer <brouer@...hat.com>,
        Anatoly Burakov <anatoly.burakov@...el.com>,
        Alexander Lobakin <alexandr.lobakin@...el.com>,
        Magnus Karlsson <magnus.karlsson@...il.com>,
        Maryam Tahhan <mtahhan@...hat.com>, xdp-hints@...-project.net,
        netdev@...r.kernel.org
Subject: Re: [xdp-hints] Re: [PATCH bpf-next v2 6/8] mlx4: Introduce
 mlx4_xdp_buff wrapper for xdp_buff

sdf@...gle.com writes:

> On 11/23, Jakub Kicinski wrote:
>> On Wed, 23 Nov 2022 10:26:41 -0800 Stanislav Fomichev wrote:
>> > > This embedding trick works for drivers that put xdp_buff on the stack,
>> > > but mlx5 supports XSK zerocopy, which uses the xsk_buff_pool for
>> > > allocating them. This makes it a bit awkward to do the same thing  
>> there;
>> > > and since it's probably going to be fairly common to do something like
>> > > this, how about we just add a 'void *drv_priv' pointer to struct
>> > > xdp_buff that the drivers can use? The xdp_buff already takes up a  
>> full
>> > > cache line anyway, so any data stuffed after it will spill over to a  
>> new
>> > > one; so I don't think there's much difference performance-wise.
>> >
>> > I guess the alternative is to extend xsk_buff_pool with some new
>> > argument for xdp_buff tailroom? (so it can kmalloc(sizeof(xdp_buff) +
>> > xdp_buff_tailroom))
>> > But it seems messy because there is no way of knowing what the target
>> > device's tailroom is, so it has to be a user setting :-/
>> > I've started with a priv pointer in xdp_buff initially, it seems fine
>> > to go back. I'll probably convert veth/mlx4 to the same mode as well
>> > to avoid having different approaches in different places..
>
>> Can we not do this please? Add 16B of "private driver space" after
>> the xdp_buff in xdp_buff_xsk (we have 16B to full cacheline), the
>> drivers decide how they use it. Drivers can do BUILD_BUG_ON() for their
>> expected size and cast that to whatever struct they want. This is how
>> various offloads work, the variable size tailroom would be an over
>> design IMO.
>
>> And this way non XSK paths can keep its normal typing.
>
> Good idea, prototyped below, lmk if it that's not what you had in mind.
>
> struct xdp_buff_xsk {
> 	struct xdp_buff            xdp;                  /*     0    56 */
> 	u8                         cb[16];               /*    56    16 */
> 	/* --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- */

As pahole helpfully says here, xdp_buff is actually only 8 bytes from
being a full cache line. I thought about adding a 'cb' field like this
to xdp_buff itself, but figured that since there's only room for a
single pointer, why not just add that and let the driver point it to
where it wants to store the extra context data?

I am not suggesting to make anything variable-size; the 'void *drv_priv'
is just a normal pointer. There's no changes to any typing; not sure
where you got that from, Jakub?

Also, the priv pointer approach works for both XSK and on-stack
allocations, unlike this approach (see below).

> 	dma_addr_t                 dma;                  /*    72     8 */
> 	dma_addr_t                 frame_dma;            /*    80     8 */
> 	struct xsk_buff_pool *     pool;                 /*    88     8 */
> 	u64                        orig_addr;            /*    96     8 */
> 	struct list_head           free_list_node;       /*   104    16 */
>
> 	/* size: 120, cachelines: 2, members: 7 */
> 	/* last cacheline: 56 bytes */
> };
>
> Toke, I can try to merge this into your patch + keep your SoB (or feel free
> to try this and retest yourself, whatever works).
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.h  
> b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.h
> index bc2d9034af5b..837bf103b871 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.h
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.h
> @@ -44,6 +44,11 @@
>   	(MLX5E_XDP_INLINE_WQE_MAX_DS_CNT * MLX5_SEND_WQE_DS - \
>   	 sizeof(struct mlx5_wqe_inline_seg))
>
> +struct mlx5_xdp_cb {
> +	struct mlx5_cqe64 *cqe;
> +	struct mlx5e_rq *rq;
> +};
> +
>   struct mlx5e_xsk_param;
>   int mlx5e_xdp_max_mtu(struct mlx5e_params *params, struct mlx5e_xsk_param  
> *xsk);
>   bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct page *page,
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c  
> b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
> index c91b54d9ff27..84d23b2da7ce 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
> @@ -5,6 +5,7 @@
>   #include "en/xdp.h"
>   #include <net/xdp_sock_drv.h>
>   #include <linux/filter.h>
> +#include <linux/build_bug.h>
>
>   /* RX data path */
>
> @@ -286,8 +287,14 @@ struct sk_buff *mlx5e_xsk_skb_from_cqe_linear(struct  
> mlx5e_rq *rq,
>   					      u32 cqe_bcnt)
>   {
>   	struct xdp_buff *xdp = wi->au->xsk;
> +	struct mlx5_xdp_cb *cb;
>   	struct bpf_prog *prog;
>
> +	BUILD_BUG_ON(sizeof(struct mlx5_xdp_cb) > XSKB_CB_SIZE);
> +	cb = xp_get_cb(xdp);
> +	cb->cqe = NULL /*cqe*/;
> +	cb->rq = rq;

So this works fine for the XSK path, but for the regular XDP path, mlx5
*does* indeed put the xdp_buff on the stack. So to re-use code there
would be an implicit assumption that both memory layout and size matches
between the two paths. I'm not sure that's better than just having a
pointer inside the xdp_buff and pointing it wherever makes sense for
that driver (as my patch did)?

-Toke

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ