netdev - Re: [bpf-next V2 PATCH 02/15] xdp: introduce xdp_return

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180309103536.1d7fdc58@redhat.com>
Date:   Fri, 9 Mar 2018 10:35:36 +0100
From:   Jesper Dangaard Brouer <brouer@...hat.com>
To:     Jason Wang <jasowang@...hat.com>
Cc:     netdev@...r.kernel.org,
        BjörnTöpel <bjorn.topel@...el.com>,
        magnus.karlsson@...el.com, eugenia@...lanox.com,
        John Fastabend <john.fastabend@...il.com>,
        Eran Ben Elisha <eranbe@...lanox.com>,
        Saeed Mahameed <saeedm@...lanox.com>, galp@...lanox.com,
        Daniel Borkmann <borkmann@...earbox.net>,
        Alexei Starovoitov <alexei.starovoitov@...il.com>,
        Tariq Toukan <tariqt@...lanox.com>, brouer@...hat.com
Subject: Re: [bpf-next V2 PATCH 02/15] xdp: introduce xdp_return_frame API
 and use in cpumap

On Fri, 9 Mar 2018 15:24:10 +0800
Jason Wang <jasowang@...hat.com> wrote:

> On 2018年03月08日 21:07, Jesper Dangaard Brouer wrote:
> > Introduce an xdp_return_frame API, and convert over cpumap as
> > the first user, given it have queued XDP frame structure to leverage.
> >
> > Signed-off-by: Jesper Dangaard Brouer <brouer@...hat.com>
> > ---
> >   include/net/xdp.h   |   32 +++++++++++++++++++++++++++
> >   kernel/bpf/cpumap.c |   60 +++++++++++++++++++++++++++++++--------------------
> >   net/core/xdp.c      |   18 +++++++++++++++
> >   3 files changed, 86 insertions(+), 24 deletions(-)
> >
> > diff --git a/include/net/xdp.h b/include/net/xdp.h
> > index b2362ddfa694..3cb726a6dc5b 100644
> > --- a/include/net/xdp.h
> > +++ b/include/net/xdp.h
> > @@ -33,16 +33,48 @@
> >    * also mandatory during RX-ring setup.
> >    */
> >   
> > +enum mem_type {
> > +	MEM_TYPE_PAGE_SHARED = 0, /* Split-page refcnt based model */
> > +	MEM_TYPE_PAGE_ORDER0,     /* Orig XDP full page model */
> > +	// Possible new ideas for types:
> > +	// MEM_TYPE_PAGE_POOL,    /* Will be added later  */
> > +	// MEM_TYPE_AF_XDP,
> > +	// MEM_TYPE_DEVICE_OWNED -- invoking an dev->ndo?
> > +	MEM_TYPE_MAX,
> > +};  
> 
> So if we plan to support dev->ndo, it looks to me two types AF_XDP and 
> DEVICE_OWNED are sufficient? Driver can do what it wants (e.g page pool 
> or ordinary page allocator) in ndo or what ever other callbacks.

So, the design is open to go both ways, we can figure out later.

The reason I'm not calling page_pool from a driver level dev->ndo, is
that I'm trying to avoid invoking code in a module, as the driver
module code can be (in the process of being) unloaded from the kernel,
while another driver is calling xdp_return_frame.  The current design
in patch 10, uses the RCU period to guarantee that the allocator
pointer is valid as long as the ID lookup succeeded.

To do what you propose, we also need to guarantee that the net_device
cannot disappear so it is safe to invoke dev->ndo.  To do so, the
driver likely have to add a rcu_barrier before unloading.  I'm also
thinking that for MEM_TYPE_DEVICE_OWNED the allocator pointer ("under
protection") could be the net_device pointer, and we could take a
refcnt on dev (dev_hold/dev_put). Thus, I think it is doable, but lets
figure this out later.

Another important design consideration, is that the xdp core need to
know how to release memory in case the ID lookup failed.  This is
another argument for keeping the memory release code inside the xdp
core, and not leave too much freedom to the drivers.


> > +struct xdp_mem_info {
> > +	u32 type; /* enum mem_type, but known size type */
> > +	u32 id; // Needed later (to lookup struct xdp_rxq_info)
> > +};
> > +
> >   struct xdp_rxq_info {
> >   	struct net_device *dev;
> >   	u32 queue_index;
> >   	u32 reg_state;
> > +	struct xdp_mem_info mem;
> >   } ____cacheline_aligned; /* perf critical, avoid false-sharing */
> >   
> > +
> > +static inline
> > +void xdp_return_frame(void *data, struct xdp_mem_info *mem)
> > +{
> > +	if (mem->type == MEM_TYPE_PAGE_SHARED)
> > +		page_frag_free(data);
> > +
> > +	if (mem->type == MEM_TYPE_PAGE_ORDER0) {
> > +		struct page *page = virt_to_page(data); /* Assumes order0 page*/
> > +
> > +		put_page(page);
> > +	}
> > +}
> > +
> >   int xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq,
> >   		     struct net_device *dev, u32 queue_index);
> >   void xdp_rxq_info_unreg(struct xdp_rxq_info *xdp_rxq);
> >   void xdp_rxq_info_unused(struct xdp_rxq_info *xdp_rxq);
> >   bool xdp_rxq_info_is_reg(struct xdp_rxq_info *xdp_rxq);
> > +int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
> > +			       enum mem_type type, void *allocator);
> >   
> >   #endif /* __LINUX_NET_XDP_H__ */
> > diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
> > index a4bb0b34375a..3e4bbcbe3e86 100644
> > --- a/kernel/bpf/cpumap.c
> > +++ b/kernel/bpf/cpumap.c
[...]
> >   static void get_cpu_map_entry(struct bpf_cpu_map_entry *rcpu)
> >   {
> >   	atomic_inc(&rcpu->refcnt);
> > @@ -188,6 +168,10 @@ struct xdp_pkt {
> >   	u16 len;
> >   	u16 headroom;
> >   	u16 metasize;
> > +	/* Lifetime of xdp_rxq_info is limited to NAPI/enqueue time,
> > +	 * while mem info is valid on remote CPU.
> > +	 */  
> 
> Can we simply move the xdp_mem_info to xdp_buff to avoid conversion?

No, xdp_buff is a stack allocated piece of memory, thus we do need a
conversion into another piece of memory.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer