lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Fri, 9 Mar 2018 21:04:23 +0800 From: Jason Wang <jasowang@...hat.com> To: Jesper Dangaard Brouer <brouer@...hat.com> Cc: netdev@...r.kernel.org, BjörnTöpel <bjorn.topel@...el.com>, magnus.karlsson@...el.com, eugenia@...lanox.com, John Fastabend <john.fastabend@...il.com>, Eran Ben Elisha <eranbe@...lanox.com>, Saeed Mahameed <saeedm@...lanox.com>, galp@...lanox.com, Daniel Borkmann <borkmann@...earbox.net>, Alexei Starovoitov <alexei.starovoitov@...il.com>, Tariq Toukan <tariqt@...lanox.com> Subject: Re: [bpf-next V2 PATCH 02/15] xdp: introduce xdp_return_frame API and use in cpumap On 2018年03月09日 17:35, Jesper Dangaard Brouer wrote: > On Fri, 9 Mar 2018 15:24:10 +0800 > Jason Wang <jasowang@...hat.com> wrote: > >> On 2018年03月08日 21:07, Jesper Dangaard Brouer wrote: >>> Introduce an xdp_return_frame API, and convert over cpumap as >>> the first user, given it have queued XDP frame structure to leverage. >>> >>> Signed-off-by: Jesper Dangaard Brouer <brouer@...hat.com> >>> --- >>> include/net/xdp.h | 32 +++++++++++++++++++++++++++ >>> kernel/bpf/cpumap.c | 60 +++++++++++++++++++++++++++++++-------------------- >>> net/core/xdp.c | 18 +++++++++++++++ >>> 3 files changed, 86 insertions(+), 24 deletions(-) >>> >>> diff --git a/include/net/xdp.h b/include/net/xdp.h >>> index b2362ddfa694..3cb726a6dc5b 100644 >>> --- a/include/net/xdp.h >>> +++ b/include/net/xdp.h >>> @@ -33,16 +33,48 @@ >>> * also mandatory during RX-ring setup. >>> */ >>> >>> +enum mem_type { >>> + MEM_TYPE_PAGE_SHARED = 0, /* Split-page refcnt based model */ >>> + MEM_TYPE_PAGE_ORDER0, /* Orig XDP full page model */ >>> + // Possible new ideas for types: >>> + // MEM_TYPE_PAGE_POOL, /* Will be added later */ >>> + // MEM_TYPE_AF_XDP, >>> + // MEM_TYPE_DEVICE_OWNED -- invoking an dev->ndo? >>> + MEM_TYPE_MAX, >>> +}; >> So if we plan to support dev->ndo, it looks to me two types AF_XDP and >> DEVICE_OWNED are sufficient? Driver can do what it wants (e.g page pool >> or ordinary page allocator) in ndo or what ever other callbacks. > So, the design is open to go both ways, we can figure out later. > > The reason I'm not calling page_pool from a driver level dev->ndo, is > that I'm trying to avoid invoking code in a module, as the driver > module code can be (in the process of being) unloaded from the kernel, > while another driver is calling xdp_return_frame. The current design > in patch 10, uses the RCU period to guarantee that the allocator > pointer is valid as long as the ID lookup succeeded. > > To do what you propose, we also need to guarantee that the net_device > cannot disappear so it is safe to invoke dev->ndo. To do so, the > driver likely have to add a rcu_barrier before unloading. I'm also > thinking that for MEM_TYPE_DEVICE_OWNED the allocator pointer ("under > protection") could be the net_device pointer, and we could take a > refcnt on dev (dev_hold/dev_put). Then it looks like a device (e.g tun) can lock another device for indefinite time. > Thus, I think it is doable, but lets > figure this out later. > > Another important design consideration, is that the xdp core need to > know how to release memory in case the ID lookup failed. Can we simply use put_page() in this case? > This is > another argument for keeping the memory release code inside the xdp > core, and not leave too much freedom to the drivers. > > >>> +struct xdp_mem_info { >>> + u32 type; /* enum mem_type, but known size type */ >>> + u32 id; // Needed later (to lookup struct xdp_rxq_info) >>> +}; >>> + >>> struct xdp_rxq_info { >>> struct net_device *dev; >>> u32 queue_index; >>> u32 reg_state; >>> + struct xdp_mem_info mem; >>> } ____cacheline_aligned; /* perf critical, avoid false-sharing */ >>> >>> + >>> +static inline >>> +void xdp_return_frame(void *data, struct xdp_mem_info *mem) >>> +{ >>> + if (mem->type == MEM_TYPE_PAGE_SHARED) >>> + page_frag_free(data); >>> + >>> + if (mem->type == MEM_TYPE_PAGE_ORDER0) { >>> + struct page *page = virt_to_page(data); /* Assumes order0 page*/ >>> + >>> + put_page(page); >>> + } >>> +} >>> + >>> int xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq, >>> struct net_device *dev, u32 queue_index); >>> void xdp_rxq_info_unreg(struct xdp_rxq_info *xdp_rxq); >>> void xdp_rxq_info_unused(struct xdp_rxq_info *xdp_rxq); >>> bool xdp_rxq_info_is_reg(struct xdp_rxq_info *xdp_rxq); >>> +int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq, >>> + enum mem_type type, void *allocator); >>> >>> #endif /* __LINUX_NET_XDP_H__ */ >>> diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c >>> index a4bb0b34375a..3e4bbcbe3e86 100644 >>> --- a/kernel/bpf/cpumap.c >>> +++ b/kernel/bpf/cpumap.c > [...] >>> static void get_cpu_map_entry(struct bpf_cpu_map_entry *rcpu) >>> { >>> atomic_inc(&rcpu->refcnt); >>> @@ -188,6 +168,10 @@ struct xdp_pkt { >>> u16 len; >>> u16 headroom; >>> u16 metasize; >>> + /* Lifetime of xdp_rxq_info is limited to NAPI/enqueue time, >>> + * while mem info is valid on remote CPU. >>> + */ >> Can we simply move the xdp_mem_info to xdp_buff to avoid conversion? > No, xdp_buff is a stack allocated piece of memory, thus we do need a > conversion into another piece of memory. > Right but just duplicate xdp_buff content is sufficient in this case or is there anything other I missed? Thanks
Powered by blists - more mailing lists