lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 14 Apr 2021 23:14:04 +0000
From:   "Joseph, Jithu" <jithu.joseph@...el.com>
To:     "kuba@...nel.org" <kuba@...nel.org>,
        "Nguyen, Anthony L" <anthony.l.nguyen@...el.com>
CC:     "Karlsson, Magnus" <magnus.karlsson@...el.com>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "Gomes, Vinicius" <vinicius.gomes@...el.com>,
        "Fijalkowski, Maciej" <maciej.fijalkowski@...el.com>,
        "sassmann@...hat.com" <sassmann@...hat.com>,
        "Lifshits, Vitaly" <vitaly.lifshits@...el.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "Neftin, Sasha" <sasha.neftin@...el.com>,
        "Desouza, Ederson" <ederson.desouza@...el.com>,
        "bjorn.topel@...el.com" <bjorn.topel@...el.com>,
        "dvorax.fuxbrumer@...ux.intel.com" <dvorax.fuxbrumer@...ux.intel.com>
Subject: Re: [PATCH net-next 8/9] igc: Enable RX via AF_XDP zero-copy

Hi Jakub, 
 
Apologies for the delay, I am looking into this as the original
developer Andre is no-longer with Intel. I really appreciate your
review feedback.

(I removed Andre's and Vedang's email from the cc list as they are
bouncing and have added a couple of Intel folks) 

Pardon me if  I have not  understood your questions precisely or if
some of the replies are not concise (I am still understanding XDP flow
patterns.) 

I  see that lot of the design patterns followed by this patch series,
follow the approaches from other Intel drivers like (ice, ixgbe, i140e)

On Fri, 2021-04-09 at 17:36 -0700, Jakub Kicinski wrote:
> On Fri,  9 Apr 2021 09:43:50 -0700 Tony Nguyen wrote:
> > From: Andre Guedes <andre.guedes@...el.com>
> > 
> > Add support for receiving packets via AF_XDP zero-copy mechanism.
> > 
> > Add a new flag to 'enum igc_ring_flags_t' to indicate the ring has
> > AF_XDP zero-copy enabled so proper ring setup is carried out during
> > ring
> > configuration in igc_configure_rx_ring().
> > 
> > RX buffers can now be allocated via the shared pages mechanism
> > (default
> > behavior of the driver) or via xsk pool (when AF_XDP zero-copy is
> > enabled) so a union is added to the 'struct igc_rx_buffer' to cover
> > both
> > cases.
> > 
> > When AF_XDP zero-copy is enabled, rx buffers are allocated from the
> > xsk
> > pool using the new helper igc_alloc_rx_buffers_zc() which is the
> > counterpart of igc_alloc_rx_buffers().
> > 
> > Likewise other Intel drivers that support AF_XDP zero-copy, in igc
> > we
> > have a dedicated path for cleaning up rx irqs when zero-copy is
> > enabled.
> > This avoids adding too many checks within igc_clean_rx_irq(),
> > resulting
> > in a more readable and efficient code since this function is called
> > from
> > the hot-path of the driver.
> > +static struct sk_buff *igc_construct_skb_zc(struct igc_ring *ring,
> > +					    struct xdp_buff *xdp)
> > +{
> > +	unsigned int metasize = xdp->data - xdp->data_meta;
> > +	unsigned int datasize = xdp->data_end - xdp->data;
> > +	struct sk_buff *skb;
> > +
> > +	skb = __napi_alloc_skb(&ring->q_vector->napi,
> > +			       xdp->data_end - xdp->data_hard_start,
> > +			       GFP_ATOMIC | __GFP_NOWARN);
> > +	if (unlikely(!skb))
> > +		return NULL;
> > +
> > +	skb_reserve(skb, xdp->data - xdp->data_hard_start);
> > +	memcpy(__skb_put(skb, datasize), xdp->data, datasize);
> > +	if (metasize)
> > +		skb_metadata_set(skb, metasize);
> 
> But you haven't actually copied the matadata into the skb,
> the metadata is before xdp->data, right?

Today the igc driver doesn’t add any metadata (except for hw time
stamps explained later) . So for most part, xdp->data and xdp-
>data_meta point to the same address . That could be why in this
initial implementation we are not copying  the metadata into skb (as
the driver doesn’t add any).  

If the XDP program adds some metadata before xdp->data (and  xdp-
>data_meta reflects this), that is NOT copied into the SKB as you
mentioned .   Is the expectation that meta_data (if any added by the
bpf program) , should also be copied to the skb  in this XDP_PASS flow
? If so I can revise this patch to do that. 

If h/w time-stamp is added by the NIC, then metasize will be non zero
(as  xdp->data is advanced by the driver ) .  h/w ts  is still copied
into "skb_hwtstamps(skb)->hwtstamp" by  the caller of this function
igc_dispatch_skb_zc()  . Do you still want it to be copied into
__skb_put(skb, ) area too ? 

> 
> > +	return skb;
> > +}
> > +static int igc_xdp_enable_pool(struct igc_adapter *adapter,
> > +			       struct xsk_buff_pool *pool, u16
> > queue_id)
> > +{
> > +	struct net_device *ndev = adapter->netdev;
> > +	struct device *dev = &adapter->pdev->dev;
> > +	struct igc_ring *rx_ring;
> > +	struct napi_struct *napi;
> > +	bool needs_reset;
> > +	u32 frame_size;
> > +	int err;
> > +
> > +	if (queue_id >= adapter->num_rx_queues)
> > +		return -EINVAL;
> > +
> > +	frame_size = xsk_pool_get_rx_frame_size(pool);
> > +	if (frame_size < ETH_FRAME_LEN + VLAN_HLEN * 2) {
> > +		/* When XDP is enabled, the driver doesn't support
> > frames that
> > +		 * span over multiple buffers. To avoid that, we check
> > if xsk
> > +		 * frame size is big enough to fit the max ethernet
> > frame size
> > +		 * + vlan double tagging.
> > +		 */
> > +		return -EOPNOTSUPP;
> > +	}
> > +
> > +	err = xsk_pool_dma_map(pool, dev, IGC_RX_DMA_ATTR);
> > +	if (err) {
> > +		netdev_err(ndev, "Failed to map xsk pool\n");
> > +		return err;
> > +	}
> > +
> > +	needs_reset = netif_running(adapter->netdev) &&
> > igc_xdp_is_enabled(adapter);
> > +
> > +	rx_ring = adapter->rx_ring[queue_id];
> > +	napi = &rx_ring->q_vector->napi;
> > +
> > +	if (needs_reset) {
> > +		igc_disable_rx_ring(rx_ring);
> > +		napi_disable(napi);
> > +	}
> > +
> > +	set_bit(IGC_RING_FLAG_AF_XDP_ZC, &rx_ring->flags);
> > +
> > +	if (needs_reset) {
> > +		napi_enable(napi);
> > +		igc_enable_rx_ring(rx_ring);
> > +
> > +		err = igc_xsk_wakeup(ndev, queue_id, XDP_WAKEUP_RX);
> > +		if (err)
> > +			return err;
> 
> No need for an unwind path here?
> Does something call XDP_SETUP_XSK_POOL(NULL) on failure
> automagically?

I think we should add a xsk_pool_dma_unmap() in this failure path
?  Did I understand you correctly ?

> 
> > +	}
> > +
> > +	return 0;
> > +}

Thanks
Jithu

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ