lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 24 Jan 2024 12:49:21 +0100
From: Magnus Karlsson <magnus.karlsson@...il.com>
To: Maciej Fijalkowski <maciej.fijalkowski@...el.com>
Cc: bpf@...r.kernel.org, ast@...nel.org, daniel@...earbox.net, 
	andrii@...nel.org, netdev@...r.kernel.org, magnus.karlsson@...el.com, 
	bjorn@...nel.org, echaudro@...hat.com, lorenzo@...nel.org, 
	martin.lau@...ux.dev, tirthendu.sarkar@...el.com, john.fastabend@...il.com, 
	horms@...nel.org
Subject: Re: [PATCH v5 bpf 02/11] xsk: make xsk_buff_pool responsible for
 clearing xdp_buff::flags

On Wed, 24 Jan 2024 at 12:42, Maciej Fijalkowski
<maciej.fijalkowski@...el.com> wrote:
>
> On Wed, Jan 24, 2024 at 09:20:26AM +0100, Magnus Karlsson wrote:
> > On Mon, 22 Jan 2024 at 23:16, Maciej Fijalkowski
> > <maciej.fijalkowski@...el.com> wrote:
> > >
> > > XDP multi-buffer support introduced XDP_FLAGS_HAS_FRAGS flag that is
> > > used by drivers to notify data path whether xdp_buff contains fragments
> > > or not. Data path looks up mentioned flag on first buffer that occupies
> > > the linear part of xdp_buff, so drivers only modify it there. This is
> > > sufficient for SKB and XDP_DRV modes as usually xdp_buff is allocated on
> > > stack or it resides within struct representing driver's queue and
> > > fragments are carried via skb_frag_t structs. IOW, we are dealing with
> > > only one xdp_buff.
> > >
> > > ZC mode though relies on list of xdp_buff structs that is carried via
> > > xsk_buff_pool::xskb_list, so ZC data path has to make sure that
> > > fragments do *not* have XDP_FLAGS_HAS_FRAGS set. Otherwise,
> > > xsk_buff_free() could misbehave if it would be executed against xdp_buff
> > > that carries a frag with XDP_FLAGS_HAS_FRAGS flag set. Such scenario can
> > > take place when within supplied XDP program bpf_xdp_adjust_tail() is
> > > used with negative offset that would in turn release the tail fragment
> > > from multi-buffer frame.
> > >
> > > Calling xsk_buff_free() on tail fragment with XDP_FLAGS_HAS_FRAGS would
> > > result in releasing all the nodes from xskb_list that were produced by
> > > driver before XDP program execution, which is not what is intended -
> > > only tail fragment should be deleted from xskb_list and then it should
> > > be put onto xsk_buff_pool::free_list. Such multi-buffer frame will never
> > > make it up to user space, so from AF_XDP application POV there would be
> > > no traffic running, however due to free_list getting constantly new
> > > nodes, driver will be able to feed HW Rx queue with recycled buffers.
> > > Bottom line is that instead of traffic being redirected to user space,
> > > it would be continuously dropped.
> > >
> > > To fix this, let us clear the mentioned flag on xsk_buff_pool side at
> > > allocation time, which is what should have been done right from the
> > > start of XSK multi-buffer support.
> > >
> > > Fixes: 1bbc04de607b ("ice: xsk: add RX multi-buffer support")
> > > Fixes: 1c9ba9c14658 ("i40e: xsk: add RX multi-buffer support")
> > > Fixes: 24ea50127ecf ("xsk: support mbuf on ZC RX")
> > > Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@...el.com>
> > > ---
> > >  drivers/net/ethernet/intel/i40e/i40e_xsk.c | 1 -
> > >  drivers/net/ethernet/intel/ice/ice_xsk.c   | 1 -
> > >  net/xdp/xsk_buff_pool.c                    | 3 +++
> > >  3 files changed, 3 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> > > index e99fa854d17f..fede0bb3e047 100644
> > > --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> > > +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> > > @@ -499,7 +499,6 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget)
> > >                 xdp_res = i40e_run_xdp_zc(rx_ring, first, xdp_prog);
> > >                 i40e_handle_xdp_result_zc(rx_ring, first, rx_desc, &rx_packets,
> > >                                           &rx_bytes, xdp_res, &failure);
> > > -               first->flags = 0;
> > >                 next_to_clean = next_to_process;
> > >                 if (failure)
> > >                         break;
> > > diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
> > > index 5d1ae8e4058a..d9073a618ad6 100644
> > > --- a/drivers/net/ethernet/intel/ice/ice_xsk.c
> > > +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
> > > @@ -895,7 +895,6 @@ int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget)
> > >
> > >                 if (!first) {
> > >                         first = xdp;
> > > -                       xdp_buff_clear_frags_flag(first);
> > >                 } else if (ice_add_xsk_frag(rx_ring, first, xdp, size)) {
> > >                         break;
> > >                 }
> > > diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
> > > index 28711cc44ced..dc5659da6728 100644
> > > --- a/net/xdp/xsk_buff_pool.c
> > > +++ b/net/xdp/xsk_buff_pool.c
> > > @@ -555,6 +555,7 @@ struct xdp_buff *xp_alloc(struct xsk_buff_pool *pool)
> > >
> > >         xskb->xdp.data = xskb->xdp.data_hard_start + XDP_PACKET_HEADROOM;
> > >         xskb->xdp.data_meta = xskb->xdp.data;
> > > +       xskb->xdp.flags = 0;
> > >
> > >         if (pool->dma_need_sync) {
> > >                 dma_sync_single_range_for_device(pool->dev, xskb->dma, 0,
> > > @@ -601,6 +602,7 @@ static u32 xp_alloc_new_from_fq(struct xsk_buff_pool *pool, struct xdp_buff **xd
> > >                 }
> > >
> > >                 *xdp = &xskb->xdp;
> > > +               xskb->xdp.flags = 0;
> >
> > Thanks for catching this. I am thinking we should have an if-statement
> > here and only do this when multi-buffer is enabled. The reason that we
> > have two different paths for aligned mode and unaligned mode here is
> > that we do not have to touch the xdp_buff at all at allocation time in
> > aligned mode, which provides a nice speed-up. So let us only do this
> > when necessary. What do you think? Same goes for the line in
> > xp_alloc_reused().
> >
>
> Good point. How about keeping flags = 0 in xp_alloc() and adding it to
> xsk_buff_set_size() ? We do touch xdp_buff there and these two paths cover
> batched and non-batched APIs. I do agree that doing it in
> xp_alloc_new_from_fq() and in xp_alloc_reused() is not really handy.

That is an even better idea. Go for it.

> > >                 xdp++;
> > >         }
> > >
> > > @@ -621,6 +623,7 @@ static u32 xp_alloc_reused(struct xsk_buff_pool *pool, struct xdp_buff **xdp, u3
> > >                 list_del_init(&xskb->free_list_node);
> > >
> > >                 *xdp = &xskb->xdp;
> > > +               xskb->xdp.flags = 0;
> > >                 xdp++;
> > >         }
> > >         pool->free_list_cnt -= nb_entries;
> > > --
> > > 2.34.1
> > >
> > >

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ