lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Tue, 4 Oct 2022 17:55:42 -0700 From: sdf@...gle.com To: Daniel Borkmann <daniel@...earbox.net> Cc: bpf@...r.kernel.org, razor@...ckwall.org, ast@...nel.org, andrii@...nel.org, martin.lau@...ux.dev, john.fastabend@...il.com, joannelkoong@...il.com, memxor@...il.com, toke@...hat.com, joe@...ium.io, netdev@...r.kernel.org Subject: Re: [PATCH bpf-next 01/10] bpf: Add initial fd-based API to attach tc BPF programs On 10/05, Daniel Borkmann wrote: > This work refactors and adds a lightweight extension to the tc BPF ingress > and egress data path side for allowing BPF programs via an fd-based > attach / > detach API. The main goal behind this work which we also presented at LPC > [0] > this year is to eventually add support for BPF links for tc BPF programs > in > a second step, thus this prep work is required for the latter which allows > for a model of safe ownership and program detachment. Given the vast rise > in tc BPF users in cloud native / Kubernetes environments, this becomes > necessary to avoid hard to debug incidents either through stale leftover > programs or 3rd party applications stepping on each others toes. Further > details for BPF link rationale in next patch. > For the current tc framework, there is no change in behavior with this > change > and neither does this change touch on tc core kernel APIs. The gist of > this > patch is that the ingress and egress hook gets a lightweight, qdisc-less > extension for BPF to attach its tc BPF programs, in other words, a minimal > tc-layer entry point for BPF. As part of the feedback from LPC, there was > a suggestion to provide a name for this infrastructure to more easily > differ > between the classic cls_bpf attachment and the fd-based API. As for most, > the XDP vs tc layer is already the default mental model for the pkt > processing > pipeline. We refactored this with an xtc internal prefix aka 'express > traffic > control' in order to avoid to deviate too far (and 'express' given its > more > lightweight/faster entry point). > For the ingress and egress xtc points, the device holds a cache-friendly > array > with programs. Same as with classic tc, programs are attached with a prio > that > can be specified or auto-allocated through an idr, and the program return > code > determines whether to continue in the pipeline or to terminate processing. > With TC_ACT_UNSPEC code, the processing continues (as the case today). > The goal > was to have maximum compatibility to existing tc BPF programs, so they > don't > need to be adapted. Compatibility to call into classic tcf_classify() is > also > provided in order to allow successive migration or both to cleanly > co-exist > where needed given its one logical layer. The fd-based API is behind a > static > key, so that when unused the code is also not entered. The struct > xtc_entry's > program array is currently static, but could be made dynamic if necessary > at > a point in future. Desire has also been expressed for future work to adapt > similar framework for XDP to allow multi-attach from in-kernel side, too. > Tested with tc-testing selftest suite which all passes, as well as the tc > BPF > tests from the BPF CI. > [0] https://lpc.events/event/16/contributions/1353/ > Co-developed-by: Nikolay Aleksandrov <razor@...ckwall.org> > Signed-off-by: Nikolay Aleksandrov <razor@...ckwall.org> > Signed-off-by: Daniel Borkmann <daniel@...earbox.net> > --- > MAINTAINERS | 4 +- > include/linux/bpf.h | 1 + > include/linux/netdevice.h | 14 +- > include/linux/skbuff.h | 4 +- > include/net/sch_generic.h | 2 +- > include/net/xtc.h | 181 ++++++++++++++++++++++ > include/uapi/linux/bpf.h | 35 ++++- > kernel/bpf/Kconfig | 1 + > kernel/bpf/Makefile | 1 + > kernel/bpf/net.c | 274 +++++++++++++++++++++++++++++++++ > kernel/bpf/syscall.c | 24 ++- > net/Kconfig | 5 + > net/core/dev.c | 262 +++++++++++++++++++------------ > net/core/filter.c | 4 +- > net/sched/Kconfig | 4 +- > net/sched/sch_ingress.c | 48 +++++- > tools/include/uapi/linux/bpf.h | 35 ++++- > 17 files changed, 769 insertions(+), 130 deletions(-) > create mode 100644 include/net/xtc.h > create mode 100644 kernel/bpf/net.c > diff --git a/MAINTAINERS b/MAINTAINERS > index e55a4d47324c..bb63d8d000ea 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -3850,13 +3850,15 @@ S: Maintained > F: kernel/trace/bpf_trace.c > F: kernel/bpf/stackmap.c > -BPF [NETWORKING] (tc BPF, sock_addr) > +BPF [NETWORKING] (xtc & tc BPF, sock_addr) > M: Martin KaFai Lau <martin.lau@...ux.dev> > M: Daniel Borkmann <daniel@...earbox.net> > R: John Fastabend <john.fastabend@...il.com> > L: bpf@...r.kernel.org > L: netdev@...r.kernel.org > S: Maintained > +F: include/net/xtc.h > +F: kernel/bpf/net.c > F: net/core/filter.c > F: net/sched/act_bpf.c > F: net/sched/cls_bpf.c > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > index 9e7d46d16032..71e5f43db378 100644 > --- a/include/linux/bpf.h > +++ b/include/linux/bpf.h > @@ -1473,6 +1473,7 @@ struct bpf_prog_array_item { > union { > struct bpf_cgroup_storage *cgroup_storage[MAX_BPF_CGROUP_STORAGE_TYPE]; > u64 bpf_cookie; > + u32 bpf_priority; > }; > }; > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > index eddf8ee270e7..43bbb2303e57 100644 > --- a/include/linux/netdevice.h > +++ b/include/linux/netdevice.h > @@ -1880,8 +1880,7 @@ enum netdev_ml_priv_type { > * > * @rx_handler: handler for received packets > * @rx_handler_data: XXX: need comments on this one > - * @miniq_ingress: ingress/clsact qdisc specific data for > - * ingress processing > + * @xtc_ingress: BPF/clsact qdisc specific data for ingress processing > * @ingress_queue: XXX: need comments on this one > * @nf_hooks_ingress: netfilter hooks executed for ingress packets > * @broadcast: hw bcast address > @@ -1902,8 +1901,7 @@ enum netdev_ml_priv_type { > * @xps_maps: all CPUs/RXQs maps for XPS device > * > * @xps_maps: XXX: need comments on this one > - * @miniq_egress: clsact qdisc specific data for > - * egress processing > + * @xtc_egress: BPF/clsact qdisc specific data for egress processing > * @nf_hooks_egress: netfilter hooks executed for egress packets > * @qdisc_hash: qdisc hash table > * @watchdog_timeo: Represents the timeout that is used by > @@ -2191,8 +2189,8 @@ struct net_device { > rx_handler_func_t __rcu *rx_handler; > void __rcu *rx_handler_data; > -#ifdef CONFIG_NET_CLS_ACT > - struct mini_Qdisc __rcu *miniq_ingress; > +#ifdef CONFIG_NET_XGRESS > + struct xtc_entry __rcu *xtc_ingress; > #endif > struct netdev_queue __rcu *ingress_queue; > #ifdef CONFIG_NETFILTER_INGRESS > @@ -2220,8 +2218,8 @@ struct net_device { > #ifdef CONFIG_XPS > struct xps_dev_maps __rcu *xps_maps[XPS_MAPS_MAX]; > #endif > -#ifdef CONFIG_NET_CLS_ACT > - struct mini_Qdisc __rcu *miniq_egress; > +#ifdef CONFIG_NET_XGRESS > + struct xtc_entry __rcu *xtc_egress; > #endif > #ifdef CONFIG_NETFILTER_EGRESS > struct nf_hook_entries __rcu *nf_hooks_egress; > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h > index 9fcf534f2d92..a9ff7a1996e9 100644 > --- a/include/linux/skbuff.h > +++ b/include/linux/skbuff.h > @@ -955,7 +955,7 @@ struct sk_buff { > __u8 csum_level:2; > __u8 dst_pending_confirm:1; > __u8 mono_delivery_time:1; /* See SKB_MONO_DELIVERY_TIME_MASK */ > -#ifdef CONFIG_NET_CLS_ACT > +#ifdef CONFIG_NET_XGRESS > __u8 tc_skip_classify:1; > __u8 tc_at_ingress:1; /* See TC_AT_INGRESS_MASK */ > #endif > @@ -983,7 +983,7 @@ struct sk_buff { > __u8 slow_gro:1; > __u8 csum_not_inet:1; > -#ifdef CONFIG_NET_SCHED > +#if defined(CONFIG_NET_SCHED) || defined(CONFIG_NET_XGRESS) > __u16 tc_index; /* traffic control index */ > #endif > diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h > index d5517719af4e..bc5c1da2d30f 100644 > --- a/include/net/sch_generic.h > +++ b/include/net/sch_generic.h > @@ -693,7 +693,7 @@ int skb_do_redirect(struct sk_buff *); > static inline bool skb_at_tc_ingress(const struct sk_buff *skb) > { > -#ifdef CONFIG_NET_CLS_ACT > +#ifdef CONFIG_NET_XGRESS > return skb->tc_at_ingress; > #else > return false; > diff --git a/include/net/xtc.h b/include/net/xtc.h > new file mode 100644 > index 000000000000..627dc18aa433 > --- /dev/null > +++ b/include/net/xtc.h > @@ -0,0 +1,181 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > +/* Copyright (c) 2022 Isovalent */ > +#ifndef __NET_XTC_H > +#define __NET_XTC_H > + > +#include <linux/idr.h> > +#include <linux/bpf.h> > + > +#include <net/sch_generic.h> > + > +#define XTC_MAX_ENTRIES 30 > +/* Adds 1 NULL entry. */ > +#define XTC_MAX (XTC_MAX_ENTRIES + 1) > + > +struct xtc_entry { > + struct bpf_prog_array_item items[XTC_MAX] ____cacheline_aligned; > + struct xtc_entry_pair *parent; > +}; > + > +struct mini_Qdisc; > + > +struct xtc_entry_pair { > + struct rcu_head rcu; > + struct idr idr; > + struct mini_Qdisc *miniq; > + struct xtc_entry a; > + struct xtc_entry b; > +}; > + > +static inline void xtc_set_ingress(struct sk_buff *skb, bool ingress) > +{ > +#ifdef CONFIG_NET_XGRESS > + skb->tc_at_ingress = ingress; > +#endif > +} > + > +#ifdef CONFIG_NET_XGRESS > +void xtc_inc(void); > +void xtc_dec(void); > + > +static inline void > +dev_xtc_entry_update(struct net_device *dev, struct xtc_entry *entry, > + bool ingress) > +{ > + ASSERT_RTNL(); > + if (ingress) > + rcu_assign_pointer(dev->xtc_ingress, entry); > + else > + rcu_assign_pointer(dev->xtc_egress, entry); > + synchronize_rcu(); > +} > + > +static inline struct xtc_entry *dev_xtc_entry_peer(const struct > xtc_entry *entry) > +{ > + if (entry == &entry->parent->a) > + return &entry->parent->b; > + else > + return &entry->parent->a; > +} > + > +static inline struct xtc_entry *dev_xtc_entry_create(void) > +{ > + struct xtc_entry_pair *pair = kzalloc(sizeof(*pair), GFP_KERNEL); > + > + if (pair) { > + pair->a.parent = pair; > + pair->b.parent = pair; > + idr_init(&pair->idr); > + return &pair->a; > + } > + return NULL; > +} > + > +static inline struct xtc_entry *dev_xtc_entry_fetch(struct net_device > *dev, > + bool ingress, bool *created) > +{ > + struct xtc_entry *entry = ingress ? > + rcu_dereference_rtnl(dev->xtc_ingress) : > + rcu_dereference_rtnl(dev->xtc_egress); > + > + *created = false; > + if (!entry) { > + entry = dev_xtc_entry_create(); > + if (!entry) > + return NULL; > + *created = true; > + } > + return entry; > +} > + > +static inline void dev_xtc_entry_clear(struct xtc_entry *entry) > +{ > + memset(entry->items, 0, sizeof(entry->items)); > +} > + > +static inline int dev_xtc_entry_prio_new(struct xtc_entry *entry, u32 > prio, > + struct bpf_prog *prog) > +{ > + int ret; > + > + if (prio == 0) > + prio = 1; > + ret = idr_alloc_u32(&entry->parent->idr, prog, &prio, U32_MAX, > + GFP_KERNEL); > + return ret < 0 ? ret : prio; > +} > + > +static inline void dev_xtc_entry_prio_set(struct xtc_entry *entry, u32 > prio, > + struct bpf_prog *prog) > +{ > + idr_replace(&entry->parent->idr, prog, prio); > +} > + > +static inline void dev_xtc_entry_prio_del(struct xtc_entry *entry, u32 > prio) > +{ > + idr_remove(&entry->parent->idr, prio); > +} > + > +static inline void dev_xtc_entry_free(struct xtc_entry *entry) > +{ > + idr_destroy(&entry->parent->idr); > + kfree_rcu(entry->parent, rcu); > +} > + > +static inline u32 dev_xtc_entry_total(struct xtc_entry *entry) > +{ > + const struct bpf_prog_array_item *item; > + const struct bpf_prog *prog; > + u32 num = 0; > + > + item = &entry->items[0]; > + while ((prog = READ_ONCE(item->prog))) { > + num++; > + item++; > + } > + return num; > +} > + > +static inline enum tc_action_base xtc_action_code(struct sk_buff *skb, > int code) > +{ > + switch (code) { > + case TC_PASS: > + skb->tc_index = qdisc_skb_cb(skb)->tc_classid; > + fallthrough; > + case TC_DROP: > + case TC_REDIRECT: > + return code; > + case TC_NEXT: > + default: > + return TC_NEXT; > + } > +} > + > +int xtc_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog); > +int xtc_prog_detach(const union bpf_attr *attr); > +int xtc_prog_query(const union bpf_attr *attr, > + union bpf_attr __user *uattr); > +void dev_xtc_uninstall(struct net_device *dev); > +#else > +static inline int xtc_prog_attach(const union bpf_attr *attr, > + struct bpf_prog *prog) > +{ > + return -EINVAL; > +} > + > +static inline int xtc_prog_detach(const union bpf_attr *attr) > +{ > + return -EINVAL; > +} > + > +static inline int xtc_prog_query(const union bpf_attr *attr, > + union bpf_attr __user *uattr) > +{ > + return -EINVAL; > +} > + > +static inline void dev_xtc_uninstall(struct net_device *dev) > +{ > +} > +#endif /* CONFIG_NET_XGRESS */ > +#endif /* __NET_XTC_H */ > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > index 51b9aa640ad2..de1f5546bcfe 100644 > --- a/include/uapi/linux/bpf.h > +++ b/include/uapi/linux/bpf.h > @@ -1025,6 +1025,8 @@ enum bpf_attach_type { > BPF_PERF_EVENT, > BPF_TRACE_KPROBE_MULTI, > BPF_LSM_CGROUP, > + BPF_NET_INGRESS, > + BPF_NET_EGRESS, > __MAX_BPF_ATTACH_TYPE > }; > @@ -1399,14 +1401,20 @@ union bpf_attr { > }; > struct { /* anonymous struct used by BPF_PROG_ATTACH/DETACH commands */ > - __u32 target_fd; /* container object to attach to */ > + union { > + __u32 target_fd; /* container object to attach to */ > + __u32 target_ifindex; /* target ifindex */ > + }; > __u32 attach_bpf_fd; /* eBPF program to attach */ > __u32 attach_type; > __u32 attach_flags; > - __u32 replace_bpf_fd; /* previously attached eBPF [..] > + union { > + __u32 attach_priority; > + __u32 replace_bpf_fd; /* previously attached eBPF > * program to replace if > * BPF_F_REPLACE is used > */ > + }; The series looks exciting, haven't had a chance to look deeply, will try to find some time this week. We've chatted briefly about priority during the talk, let's maybe discuss it here more? I, as a user, still really have no clue about what priority to use. We have this problem at tc, and we'll seemingly have the same problem here? I guess it's even more relevant in k8s because internally at G we can control the users. Is it worth at least trying to provide some default bands / guidance? For example, having SEC('tc/ingress') receive attach_priority=124 by default? Maybe we can even have something like 'tc/ingress_first' get attach_priority=1 and 'tc/ingress_last' with attach_priority=254? (the names are arbitrary, we can do something better) ingress_first/ingress_last can be used by some monitoring jobs. The rest can use default 124. If somebody really needs a custom priority, then they can manually use something around 124/2 if they need to trigger before the 'default' priority or 124+124/2 if they want to trigger after? Thoughts? Is it worth it? Do we care? > }; > struct { /* anonymous struct used by BPF_PROG_TEST_RUN command */ > @@ -1452,7 +1460,10 @@ union bpf_attr { > } info; > struct { /* anonymous struct used by BPF_PROG_QUERY command */ > - __u32 target_fd; /* container object to query */ > + union { > + __u32 target_fd; /* container object to query */ > + __u32 target_ifindex; /* target ifindex */ > + }; > __u32 attach_type; > __u32 query_flags; > __u32 attach_flags; > @@ -6038,6 +6049,19 @@ struct bpf_sock_tuple { > }; > }; > +/* (Simplified) user return codes for tc prog type. > + * A valid tc program must return one of these defined values. All other > + * return codes are reserved for future use. Must remain compatible with > + * their TC_ACT_* counter-parts. For compatibility in behavior, unknown > + * return codes are mapped to TC_NEXT. > + */ > +enum tc_action_base { > + TC_NEXT = -1, > + TC_PASS = 0, > + TC_DROP = 2, > + TC_REDIRECT = 7, > +}; > + > struct bpf_xdp_sock { > __u32 queue_id; > }; > @@ -6804,6 +6828,11 @@ struct bpf_flow_keys { > __be32 flow_label; > }; > +struct bpf_query_info { > + __u32 prog_id; > + __u32 prio; > +}; > + > struct bpf_func_info { > __u32 insn_off; > __u32 type_id; > diff --git a/kernel/bpf/Kconfig b/kernel/bpf/Kconfig > index 2dfe1079f772..6a906ff93006 100644 > --- a/kernel/bpf/Kconfig > +++ b/kernel/bpf/Kconfig > @@ -31,6 +31,7 @@ config BPF_SYSCALL > select TASKS_TRACE_RCU > select BINARY_PRINTF > select NET_SOCK_MSG if NET > + select NET_XGRESS if NET > select PAGE_POOL if NET > default n > help > diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile > index 341c94f208f4..76c3f9d4e2f3 100644 > --- a/kernel/bpf/Makefile > +++ b/kernel/bpf/Makefile > @@ -20,6 +20,7 @@ obj-$(CONFIG_BPF_SYSCALL) += devmap.o > obj-$(CONFIG_BPF_SYSCALL) += cpumap.o > obj-$(CONFIG_BPF_SYSCALL) += offload.o > obj-$(CONFIG_BPF_SYSCALL) += net_namespace.o > +obj-$(CONFIG_BPF_SYSCALL) += net.o > endif > ifeq ($(CONFIG_PERF_EVENTS),y) > obj-$(CONFIG_BPF_SYSCALL) += stackmap.o > diff --git a/kernel/bpf/net.c b/kernel/bpf/net.c > new file mode 100644 > index 000000000000..ab9a9dee615b > --- /dev/null > +++ b/kernel/bpf/net.c > @@ -0,0 +1,274 @@ > +// SPDX-License-Identifier: GPL-2.0-only > +/* Copyright (c) 2022 Isovalent */ > + > +#include <linux/bpf.h> > +#include <linux/filter.h> > +#include <linux/netdevice.h> > + > +#include <net/xtc.h> > + > +static int __xtc_prog_attach(struct net_device *dev, bool ingress, u32 > limit, > + struct bpf_prog *nprog, u32 prio, u32 flags) > +{ > + struct bpf_prog_array_item *item, *tmp; > + struct xtc_entry *entry, *peer; > + struct bpf_prog *oprog; > + bool created; > + int i, j; > + > + ASSERT_RTNL(); > + > + entry = dev_xtc_entry_fetch(dev, ingress, &created); > + if (!entry) > + return -ENOMEM; > + for (i = 0; i < limit; i++) { > + item = &entry->items[i]; > + oprog = item->prog; > + if (!oprog) > + break; > + if (item->bpf_priority == prio) { > + if (flags & BPF_F_REPLACE) { > + /* Pairs with READ_ONCE() in xtc_run_progs(). */ > + WRITE_ONCE(item->prog, nprog); > + bpf_prog_put(oprog); > + dev_xtc_entry_prio_set(entry, prio, nprog); > + return prio; > + } > + return -EBUSY; > + } > + } > + if (dev_xtc_entry_total(entry) >= limit) > + return -ENOSPC; > + prio = dev_xtc_entry_prio_new(entry, prio, nprog); > + if (prio < 0) { > + if (created) > + dev_xtc_entry_free(entry); > + return -ENOMEM; > + } > + peer = dev_xtc_entry_peer(entry); > + dev_xtc_entry_clear(peer); > + for (i = 0, j = 0; i < limit; i++, j++) { > + item = &entry->items[i]; > + tmp = &peer->items[j]; > + oprog = item->prog; > + if (!oprog) { > + if (i == j) { > + tmp->prog = nprog; > + tmp->bpf_priority = prio; > + } > + break; > + } else if (item->bpf_priority < prio) { > + tmp->prog = oprog; > + tmp->bpf_priority = item->bpf_priority; > + } else if (item->bpf_priority > prio) { > + if (i == j) { > + tmp->prog = nprog; > + tmp->bpf_priority = prio; > + tmp = &peer->items[++j]; > + } > + tmp->prog = oprog; > + tmp->bpf_priority = item->bpf_priority; > + } > + } > + dev_xtc_entry_update(dev, peer, ingress); > + if (ingress) > + net_inc_ingress_queue(); > + else > + net_inc_egress_queue(); > + xtc_inc(); > + return prio; > +} > + > +int xtc_prog_attach(const union bpf_attr *attr, struct bpf_prog *nprog) > +{ > + struct net *net = current->nsproxy->net_ns; > + bool ingress = attr->attach_type == BPF_NET_INGRESS; > + struct net_device *dev; > + int ret; > + > + if (attr->attach_flags & ~BPF_F_REPLACE) > + return -EINVAL; > + rtnl_lock(); > + dev = __dev_get_by_index(net, attr->target_ifindex); > + if (!dev) { > + rtnl_unlock(); > + return -EINVAL; > + } > + ret = __xtc_prog_attach(dev, ingress, XTC_MAX_ENTRIES, nprog, > + attr->attach_priority, attr->attach_flags); > + rtnl_unlock(); > + return ret; > +} > + > +static int __xtc_prog_detach(struct net_device *dev, bool ingress, u32 > limit, > + u32 prio) > +{ > + struct bpf_prog_array_item *item, *tmp; > + struct bpf_prog *oprog, *fprog = NULL; > + struct xtc_entry *entry, *peer; > + int i, j; > + > + ASSERT_RTNL(); > + > + entry = ingress ? > + rcu_dereference_rtnl(dev->xtc_ingress) : > + rcu_dereference_rtnl(dev->xtc_egress); > + if (!entry) > + return -ENOENT; > + peer = dev_xtc_entry_peer(entry); > + dev_xtc_entry_clear(peer); > + for (i = 0, j = 0; i < limit; i++) { > + item = &entry->items[i]; > + tmp = &peer->items[j]; > + oprog = item->prog; > + if (!oprog) > + break; > + if (item->bpf_priority != prio) { > + tmp->prog = oprog; > + tmp->bpf_priority = item->bpf_priority; > + j++; > + } else { > + fprog = oprog; > + } > + } > + if (fprog) { > + dev_xtc_entry_prio_del(peer, prio); > + if (dev_xtc_entry_total(peer) == 0 && !entry->parent->miniq) > + peer = NULL; > + dev_xtc_entry_update(dev, peer, ingress); > + bpf_prog_put(fprog); > + if (!peer) > + dev_xtc_entry_free(entry); > + if (ingress) > + net_dec_ingress_queue(); > + else > + net_dec_egress_queue(); > + xtc_dec(); > + return 0; > + } > + return -ENOENT; > +} > + > +int xtc_prog_detach(const union bpf_attr *attr) > +{ > + struct net *net = current->nsproxy->net_ns; > + bool ingress = attr->attach_type == BPF_NET_INGRESS; > + struct net_device *dev; > + int ret; > + > + if (attr->attach_flags || !attr->attach_priority) > + return -EINVAL; > + rtnl_lock(); > + dev = __dev_get_by_index(net, attr->target_ifindex); > + if (!dev) { > + rtnl_unlock(); > + return -EINVAL; > + } > + ret = __xtc_prog_detach(dev, ingress, XTC_MAX_ENTRIES, > + attr->attach_priority); > + rtnl_unlock(); > + return ret; > +} > + > +static void __xtc_prog_detach_all(struct net_device *dev, bool ingress, > u32 limit) > +{ > + struct bpf_prog_array_item *item; > + struct xtc_entry *entry; > + struct bpf_prog *prog; > + int i; > + > + ASSERT_RTNL(); > + > + entry = ingress ? > + rcu_dereference_rtnl(dev->xtc_ingress) : > + rcu_dereference_rtnl(dev->xtc_egress); > + if (!entry) > + return; > + dev_xtc_entry_update(dev, NULL, ingress); > + for (i = 0; i < limit; i++) { > + item = &entry->items[i]; > + prog = item->prog; > + if (!prog) > + break; > + dev_xtc_entry_prio_del(entry, item->bpf_priority); > + bpf_prog_put(prog); > + if (ingress) > + net_dec_ingress_queue(); > + else > + net_dec_egress_queue(); > + xtc_dec(); > + } > + dev_xtc_entry_free(entry); > +} > + > +void dev_xtc_uninstall(struct net_device *dev) > +{ > + __xtc_prog_detach_all(dev, true, XTC_MAX_ENTRIES + 1); > + __xtc_prog_detach_all(dev, false, XTC_MAX_ENTRIES + 1); > +} > + > +static int > +__xtc_prog_query(const union bpf_attr *attr, union bpf_attr __user > *uattr, > + struct net_device *dev, bool ingress, u32 limit) > +{ > + struct bpf_query_info info, __user *uinfo; > + struct bpf_prog_array_item *item; > + struct xtc_entry *entry; > + struct bpf_prog *prog; > + u32 i, flags = 0, cnt; > + int ret = 0; > + > + ASSERT_RTNL(); > + > + entry = ingress ? > + rcu_dereference_rtnl(dev->xtc_ingress) : > + rcu_dereference_rtnl(dev->xtc_egress); > + if (!entry) > + return -ENOENT; > + cnt = dev_xtc_entry_total(entry); > + if (copy_to_user(&uattr->query.attach_flags, &flags, sizeof(flags))) > + return -EFAULT; > + if (copy_to_user(&uattr->query.prog_cnt, &cnt, sizeof(cnt))) > + return -EFAULT; > + uinfo = u64_to_user_ptr(attr->query.prog_ids); > + if (attr->query.prog_cnt == 0 || !uinfo || !cnt) > + /* return early if user requested only program count + flags */ > + return 0; > + if (attr->query.prog_cnt < cnt) { > + cnt = attr->query.prog_cnt; > + ret = -ENOSPC; > + } > + for (i = 0; i < limit; i++) { > + item = &entry->items[i]; > + prog = item->prog; > + if (!prog) > + break; > + info.prog_id = prog->aux->id; > + info.prio = item->bpf_priority; > + if (copy_to_user(uinfo + i, &info, sizeof(info))) > + return -EFAULT; > + if (i + 1 == cnt) > + break; > + } > + return ret; > +} > + > +int xtc_prog_query(const union bpf_attr *attr, union bpf_attr __user > *uattr) > +{ > + struct net *net = current->nsproxy->net_ns; > + bool ingress = attr->query.attach_type == BPF_NET_INGRESS; > + struct net_device *dev; > + int ret; > + > + if (attr->query.query_flags || attr->query.attach_flags) > + return -EINVAL; > + rtnl_lock(); > + dev = __dev_get_by_index(net, attr->query.target_ifindex); > + if (!dev) { > + rtnl_unlock(); > + return -EINVAL; > + } > + ret = __xtc_prog_query(attr, uattr, dev, ingress, XTC_MAX_ENTRIES); > + rtnl_unlock(); > + return ret; > +} > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > index 7b373a5e861f..a0a670b964bb 100644 > --- a/kernel/bpf/syscall.c > +++ b/kernel/bpf/syscall.c > @@ -36,6 +36,8 @@ > #include <linux/memcontrol.h> > #include <linux/trace_events.h> > +#include <net/xtc.h> > + > #define IS_FD_ARRAY(map) ((map)->map_type == > BPF_MAP_TYPE_PERF_EVENT_ARRAY || \ > (map)->map_type == BPF_MAP_TYPE_CGROUP_ARRAY || \ > (map)->map_type == BPF_MAP_TYPE_ARRAY_OF_MAPS) > @@ -3448,6 +3450,9 @@ attach_type_to_prog_type(enum bpf_attach_type > attach_type) > return BPF_PROG_TYPE_XDP; > case BPF_LSM_CGROUP: > return BPF_PROG_TYPE_LSM; > + case BPF_NET_INGRESS: > + case BPF_NET_EGRESS: > + return BPF_PROG_TYPE_SCHED_CLS; > default: > return BPF_PROG_TYPE_UNSPEC; > } [..] > @@ -3466,18 +3471,15 @@ static int bpf_prog_attach(const union bpf_attr > *attr) > if (CHECK_ATTR(BPF_PROG_ATTACH)) > return -EINVAL; > - > if (attr->attach_flags & ~BPF_F_ATTACH_MASK) > return -EINVAL; > ptype = attach_type_to_prog_type(attr->attach_type); > if (ptype == BPF_PROG_TYPE_UNSPEC) > return -EINVAL; > - > prog = bpf_prog_get_type(attr->attach_bpf_fd, ptype); > if (IS_ERR(prog)) > return PTR_ERR(prog); > - > if (bpf_prog_attach_check_attach_type(prog, attr->attach_type)) { > bpf_prog_put(prog); > return -EINVAL; This whole chunk can probably be dropped? > @@ -3508,16 +3510,18 @@ static int bpf_prog_attach(const union bpf_attr > *attr) > ret = cgroup_bpf_prog_attach(attr, ptype, prog); > break; > + case BPF_PROG_TYPE_SCHED_CLS: > + ret = xtc_prog_attach(attr, prog); > + break; > default: > ret = -EINVAL; > } > - > - if (ret) > + if (ret < 0) > bpf_prog_put(prog); > return ret; > } > -#define BPF_PROG_DETACH_LAST_FIELD attach_type > +#define BPF_PROG_DETACH_LAST_FIELD replace_bpf_fd > static int bpf_prog_detach(const union bpf_attr *attr) > { > @@ -3527,6 +3531,9 @@ static int bpf_prog_detach(const union bpf_attr > *attr) > return -EINVAL; > ptype = attach_type_to_prog_type(attr->attach_type); > + if (ptype != BPF_PROG_TYPE_SCHED_CLS && > + (attr->attach_flags || attr->replace_bpf_fd)) > + return -EINVAL; > switch (ptype) { > case BPF_PROG_TYPE_SK_MSG: > @@ -3545,6 +3552,8 @@ static int bpf_prog_detach(const union bpf_attr > *attr) > case BPF_PROG_TYPE_SOCK_OPS: > case BPF_PROG_TYPE_LSM: > return cgroup_bpf_prog_detach(attr, ptype); > + case BPF_PROG_TYPE_SCHED_CLS: > + return xtc_prog_detach(attr); > default: > return -EINVAL; > } > @@ -3598,6 +3607,9 @@ static int bpf_prog_query(const union bpf_attr > *attr, > case BPF_SK_MSG_VERDICT: > case BPF_SK_SKB_VERDICT: > return sock_map_bpf_prog_query(attr, uattr); > + case BPF_NET_INGRESS: > + case BPF_NET_EGRESS: > + return xtc_prog_query(attr, uattr); > default: > return -EINVAL; > } > diff --git a/net/Kconfig b/net/Kconfig > index 48c33c222199..b7a9cd174464 100644 > --- a/net/Kconfig > +++ b/net/Kconfig > @@ -52,6 +52,11 @@ config NET_INGRESS > config NET_EGRESS > bool > +config NET_XGRESS > + select NET_INGRESS > + select NET_EGRESS > + bool > + > config NET_REDIRECT > bool > diff --git a/net/core/dev.c b/net/core/dev.c > index fa53830d0683..552b805c27dd 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -107,6 +107,7 @@ > #include <net/pkt_cls.h> > #include <net/checksum.h> > #include <net/xfrm.h> > +#include <net/xtc.h> > #include <linux/highmem.h> > #include <linux/init.h> > #include <linux/module.h> > @@ -154,7 +155,6 @@ > #include "dev.h" > #include "net-sysfs.h" > - > static DEFINE_SPINLOCK(ptype_lock); > struct list_head ptype_base[PTYPE_HASH_SIZE] __read_mostly; > struct list_head ptype_all __read_mostly; /* Taps */ > @@ -3935,69 +3935,199 @@ int dev_loopback_xmit(struct net *net, struct > sock *sk, struct sk_buff *skb) > EXPORT_SYMBOL(dev_loopback_xmit); > #ifdef CONFIG_NET_EGRESS > -static struct sk_buff * > -sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev) > +static struct netdev_queue * > +netdev_tx_queue_mapping(struct net_device *dev, struct sk_buff *skb) > +{ > + int qm = skb_get_queue_mapping(skb); > + > + return netdev_get_tx_queue(dev, netdev_cap_txqueue(dev, qm)); > +} > + > +static bool netdev_xmit_txqueue_skipped(void) > +{ > + return __this_cpu_read(softnet_data.xmit.skip_txqueue); > +} > + > +void netdev_xmit_skip_txqueue(bool skip) > +{ > + __this_cpu_write(softnet_data.xmit.skip_txqueue, skip); > +} > +EXPORT_SYMBOL_GPL(netdev_xmit_skip_txqueue); > +#endif /* CONFIG_NET_EGRESS */ > + > +#ifdef CONFIG_NET_XGRESS > +static int tc_run(struct xtc_entry *entry, struct sk_buff *skb) > { > + int ret = TC_ACT_UNSPEC; > #ifdef CONFIG_NET_CLS_ACT > - struct mini_Qdisc *miniq = rcu_dereference_bh(dev->miniq_egress); > - struct tcf_result cl_res; > + struct mini_Qdisc *miniq = rcu_dereference_bh(entry->parent->miniq); > + struct tcf_result res; > if (!miniq) > - return skb; > + return ret; > - /* qdisc_skb_cb(skb)->pkt_len was already set by the caller. */ > tc_skb_cb(skb)->mru = 0; > tc_skb_cb(skb)->post_ct = false; > - mini_qdisc_bstats_cpu_update(miniq, skb); > - switch (tcf_classify(skb, miniq->block, miniq->filter_list, &cl_res, > false)) { > + mini_qdisc_bstats_cpu_update(miniq, skb); > + ret = tcf_classify(skb, miniq->block, miniq->filter_list, &res, false); > + /* Only tcf related quirks below. */ > + switch (ret) { > + case TC_ACT_SHOT: > + mini_qdisc_qstats_cpu_drop(miniq); > + break; > case TC_ACT_OK: > case TC_ACT_RECLASSIFY: > - skb->tc_index = TC_H_MIN(cl_res.classid); > + skb->tc_index = TC_H_MIN(res.classid); > break; > + } > +#endif /* CONFIG_NET_CLS_ACT */ > + return ret; > +} > + > +static DEFINE_STATIC_KEY_FALSE(xtc_needed_key); > + > +void xtc_inc(void) > +{ > + static_branch_inc(&xtc_needed_key); > +} > +EXPORT_SYMBOL_GPL(xtc_inc); > + > +void xtc_dec(void) > +{ > + static_branch_dec(&xtc_needed_key); > +} > +EXPORT_SYMBOL_GPL(xtc_dec); > + > +static __always_inline enum tc_action_base > +xtc_run(const struct xtc_entry *entry, struct sk_buff *skb, > + const bool needs_mac) > +{ > + const struct bpf_prog_array_item *item; > + const struct bpf_prog *prog; > + int ret = TC_NEXT; > + > + if (needs_mac) > + __skb_push(skb, skb->mac_len); > + item = &entry->items[0]; > + while ((prog = READ_ONCE(item->prog))) { > + bpf_compute_data_pointers(skb); > + ret = bpf_prog_run(prog, skb); > + if (ret != TC_NEXT) > + break; > + item++; > + } > + if (needs_mac) > + __skb_pull(skb, skb->mac_len); > + return xtc_action_code(skb, ret); > +} > + > +static __always_inline struct sk_buff * > +sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, > int *ret, > + struct net_device *orig_dev, bool *another) > +{ > + struct xtc_entry *entry = rcu_dereference_bh(skb->dev->xtc_ingress); > + int sch_ret; > + > + if (!entry) > + return skb; > + if (*pt_prev) { > + *ret = deliver_skb(skb, *pt_prev, orig_dev); > + *pt_prev = NULL; > + } > + > + qdisc_skb_cb(skb)->pkt_len = skb->len; > + xtc_set_ingress(skb, true); > + > + if (static_branch_unlikely(&xtc_needed_key)) { > + sch_ret = xtc_run(entry, skb, true); > + if (sch_ret != TC_ACT_UNSPEC) > + goto ingress_verdict; > + } > + sch_ret = tc_run(entry, skb); > +ingress_verdict: > + switch (sch_ret) { > + case TC_ACT_REDIRECT: > + /* skb_mac_header check was done by BPF, so we can safely > + * push the L2 header back before redirecting to another > + * netdev. > + */ > + __skb_push(skb, skb->mac_len); > + if (skb_do_redirect(skb) == -EAGAIN) { > + __skb_pull(skb, skb->mac_len); > + *another = true; > + break; > + } > + return NULL; > case TC_ACT_SHOT: > - mini_qdisc_qstats_cpu_drop(miniq); > - *ret = NET_XMIT_DROP; > - kfree_skb_reason(skb, SKB_DROP_REASON_TC_EGRESS); > + kfree_skb_reason(skb, SKB_DROP_REASON_TC_INGRESS); > return NULL; > + /* used by tc_run */ > case TC_ACT_STOLEN: > case TC_ACT_QUEUED: > case TC_ACT_TRAP: > - *ret = NET_XMIT_SUCCESS; > consume_skb(skb); > + fallthrough; > + case TC_ACT_CONSUMED: > return NULL; > + } > + > + return skb; > +} > + > +static __always_inline struct sk_buff * > +sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev) > +{ > + struct xtc_entry *entry = rcu_dereference_bh(dev->xtc_egress); > + int sch_ret; > + > + if (!entry) > + return skb; > + > + /* qdisc_skb_cb(skb)->pkt_len & xtc_set_ingress() was > + * already set by the caller. > + */ > + if (static_branch_unlikely(&xtc_needed_key)) { > + sch_ret = xtc_run(entry, skb, false); > + if (sch_ret != TC_ACT_UNSPEC) > + goto egress_verdict; > + } > + sch_ret = tc_run(entry, skb); > +egress_verdict: > + switch (sch_ret) { > case TC_ACT_REDIRECT: > + *ret = NET_XMIT_SUCCESS; > /* No need to push/pop skb's mac_header here on egress! */ > skb_do_redirect(skb); > + return NULL; > + case TC_ACT_SHOT: > + *ret = NET_XMIT_DROP; > + kfree_skb_reason(skb, SKB_DROP_REASON_TC_EGRESS); > + return NULL; > + /* used by tc_run */ > + case TC_ACT_STOLEN: > + case TC_ACT_QUEUED: > + case TC_ACT_TRAP: > *ret = NET_XMIT_SUCCESS; > return NULL; > - default: > - break; > } > -#endif /* CONFIG_NET_CLS_ACT */ > return skb; > } > - > -static struct netdev_queue * > -netdev_tx_queue_mapping(struct net_device *dev, struct sk_buff *skb) > -{ > - int qm = skb_get_queue_mapping(skb); > - > - return netdev_get_tx_queue(dev, netdev_cap_txqueue(dev, qm)); > -} > - > -static bool netdev_xmit_txqueue_skipped(void) > +#else > +static __always_inline struct sk_buff * > +sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, > int *ret, > + struct net_device *orig_dev, bool *another) > { > - return __this_cpu_read(softnet_data.xmit.skip_txqueue); > + return skb; > } > -void netdev_xmit_skip_txqueue(bool skip) > +static __always_inline struct sk_buff * > +sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev) > { > - __this_cpu_write(softnet_data.xmit.skip_txqueue, skip); > + return skb; > } > -EXPORT_SYMBOL_GPL(netdev_xmit_skip_txqueue); > -#endif /* CONFIG_NET_EGRESS */ > +#endif /* CONFIG_NET_XGRESS */ > #ifdef CONFIG_XPS > static int __get_xps_queue_idx(struct net_device *dev, struct sk_buff > *skb, > @@ -4181,9 +4311,7 @@ int __dev_queue_xmit(struct sk_buff *skb, struct > net_device *sb_dev) > skb_update_prio(skb); > qdisc_pkt_len_init(skb); > -#ifdef CONFIG_NET_CLS_ACT > - skb->tc_at_ingress = 0; > -#endif > + xtc_set_ingress(skb, false); > #ifdef CONFIG_NET_EGRESS > if (static_branch_unlikely(&egress_needed_key)) { > if (nf_hook_egress_active()) { > @@ -5101,68 +5229,6 @@ int (*br_fdb_test_addr_hook)(struct net_device > *dev, > EXPORT_SYMBOL_GPL(br_fdb_test_addr_hook); > #endif > -static inline struct sk_buff * > -sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, > int *ret, > - struct net_device *orig_dev, bool *another) > -{ > -#ifdef CONFIG_NET_CLS_ACT > - struct mini_Qdisc *miniq = rcu_dereference_bh(skb->dev->miniq_ingress); > - struct tcf_result cl_res; > - > - /* If there's at least one ingress present somewhere (so > - * we get here via enabled static key), remaining devices > - * that are not configured with an ingress qdisc will bail > - * out here. > - */ > - if (!miniq) > - return skb; > - > - if (*pt_prev) { > - *ret = deliver_skb(skb, *pt_prev, orig_dev); > - *pt_prev = NULL; > - } > - > - qdisc_skb_cb(skb)->pkt_len = skb->len; > - tc_skb_cb(skb)->mru = 0; > - tc_skb_cb(skb)->post_ct = false; > - skb->tc_at_ingress = 1; > - mini_qdisc_bstats_cpu_update(miniq, skb); > - > - switch (tcf_classify(skb, miniq->block, miniq->filter_list, &cl_res, > false)) { > - case TC_ACT_OK: > - case TC_ACT_RECLASSIFY: > - skb->tc_index = TC_H_MIN(cl_res.classid); > - break; > - case TC_ACT_SHOT: > - mini_qdisc_qstats_cpu_drop(miniq); > - kfree_skb_reason(skb, SKB_DROP_REASON_TC_INGRESS); > - return NULL; > - case TC_ACT_STOLEN: > - case TC_ACT_QUEUED: > - case TC_ACT_TRAP: > - consume_skb(skb); > - return NULL; > - case TC_ACT_REDIRECT: > - /* skb_mac_header check was done by cls/act_bpf, so > - * we can safely push the L2 header back before > - * redirecting to another netdev > - */ > - __skb_push(skb, skb->mac_len); > - if (skb_do_redirect(skb) == -EAGAIN) { > - __skb_pull(skb, skb->mac_len); > - *another = true; > - break; > - } > - return NULL; > - case TC_ACT_CONSUMED: > - return NULL; > - default: > - break; > - } > -#endif /* CONFIG_NET_CLS_ACT */ > - return skb; > -} > - > /** > * netdev_is_rx_handler_busy - check if receive handler is registered > * @dev: device to check > @@ -10832,7 +10898,7 @@ void unregister_netdevice_many(struct list_head > *head) > /* Shutdown queueing discipline. */ > dev_shutdown(dev); > - > + dev_xtc_uninstall(dev); > dev_xdp_uninstall(dev); > netdev_offload_xstats_disable_all(dev); > diff --git a/net/core/filter.c b/net/core/filter.c > index bb0136e7a8e4..ac4bb016c5ee 100644 > --- a/net/core/filter.c > +++ b/net/core/filter.c > @@ -9132,7 +9132,7 @@ static struct bpf_insn > *bpf_convert_tstamp_read(const struct bpf_prog *prog, > __u8 value_reg = si->dst_reg; > __u8 skb_reg = si->src_reg; > -#ifdef CONFIG_NET_CLS_ACT > +#ifdef CONFIG_NET_XGRESS > /* If the tstamp_type is read, > * the bpf prog is aware the tstamp could have delivery time. > * Thus, read skb->tstamp as is if tstamp_type_access is true. > @@ -9166,7 +9166,7 @@ static struct bpf_insn > *bpf_convert_tstamp_write(const struct bpf_prog *prog, > __u8 value_reg = si->src_reg; > __u8 skb_reg = si->dst_reg; > -#ifdef CONFIG_NET_CLS_ACT > +#ifdef CONFIG_NET_XGRESS > /* If the tstamp_type is read, > * the bpf prog is aware the tstamp could have delivery time. > * Thus, write skb->tstamp as is if tstamp_type_access is true. > diff --git a/net/sched/Kconfig b/net/sched/Kconfig > index 1e8ab4749c6c..c1b8f2e7d966 100644 > --- a/net/sched/Kconfig > +++ b/net/sched/Kconfig > @@ -382,8 +382,7 @@ config NET_SCH_FQ_PIE > config NET_SCH_INGRESS > tristate "Ingress/classifier-action Qdisc" > depends on NET_CLS_ACT > - select NET_INGRESS > - select NET_EGRESS > + select NET_XGRESS > help > Say Y here if you want to use classifiers for incoming and/or outgoing > packets. This qdisc doesn't do anything else besides running > classifiers, > @@ -753,6 +752,7 @@ config NET_EMATCH_IPT > config NET_CLS_ACT > bool "Actions" > select NET_CLS > + select NET_XGRESS > help > Say Y here if you want to use traffic control actions. Actions > get attached to classifiers and are invoked after a successful > diff --git a/net/sched/sch_ingress.c b/net/sched/sch_ingress.c > index 84838128b9c5..3bd37ee898ce 100644 > --- a/net/sched/sch_ingress.c > +++ b/net/sched/sch_ingress.c > @@ -13,6 +13,7 @@ > #include <net/netlink.h> > #include <net/pkt_sched.h> > #include <net/pkt_cls.h> > +#include <net/xtc.h> > struct ingress_sched_data { > struct tcf_block *block; > @@ -78,11 +79,19 @@ static int ingress_init(struct Qdisc *sch, struct > nlattr *opt, > { > struct ingress_sched_data *q = qdisc_priv(sch); > struct net_device *dev = qdisc_dev(sch); > + struct xtc_entry *entry; > + bool created; > int err; > net_inc_ingress_queue(); > - mini_qdisc_pair_init(&q->miniqp, sch, &dev->miniq_ingress); > + entry = dev_xtc_entry_fetch(dev, true, &created); > + if (!entry) > + return -ENOMEM; > + > + mini_qdisc_pair_init(&q->miniqp, sch, &entry->parent->miniq); > + if (created) > + dev_xtc_entry_update(dev, entry, true); > q->block_info.binder_type = FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS; > q->block_info.chain_head_change = clsact_chain_head_change; > @@ -93,15 +102,20 @@ static int ingress_init(struct Qdisc *sch, struct > nlattr *opt, > return err; > mini_qdisc_pair_block_init(&q->miniqp, q->block); > - > return 0; > } > static void ingress_destroy(struct Qdisc *sch) > { > struct ingress_sched_data *q = qdisc_priv(sch); > + struct net_device *dev = qdisc_dev(sch); > + struct xtc_entry *entry = rtnl_dereference(dev->xtc_ingress); > tcf_block_put_ext(q->block, sch, &q->block_info); > + if (entry && dev_xtc_entry_total(entry) == 0) { > + dev_xtc_entry_update(dev, NULL, true); > + dev_xtc_entry_free(entry); > + } > net_dec_ingress_queue(); > } > @@ -217,12 +231,20 @@ static int clsact_init(struct Qdisc *sch, struct > nlattr *opt, > { > struct clsact_sched_data *q = qdisc_priv(sch); > struct net_device *dev = qdisc_dev(sch); > + struct xtc_entry *entry; > + bool created; > int err; > net_inc_ingress_queue(); > net_inc_egress_queue(); > - mini_qdisc_pair_init(&q->miniqp_ingress, sch, &dev->miniq_ingress); > + entry = dev_xtc_entry_fetch(dev, true, &created); > + if (!entry) > + return -ENOMEM; > + > + mini_qdisc_pair_init(&q->miniqp_ingress, sch, &entry->parent->miniq); > + if (created) > + dev_xtc_entry_update(dev, entry, true); > q->ingress_block_info.binder_type = > FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS; > q->ingress_block_info.chain_head_change = clsact_chain_head_change; > @@ -235,7 +257,13 @@ static int clsact_init(struct Qdisc *sch, struct > nlattr *opt, > mini_qdisc_pair_block_init(&q->miniqp_ingress, q->ingress_block); > - mini_qdisc_pair_init(&q->miniqp_egress, sch, &dev->miniq_egress); > + entry = dev_xtc_entry_fetch(dev, false, &created); > + if (!entry) > + return -ENOMEM; > + > + mini_qdisc_pair_init(&q->miniqp_egress, sch, &entry->parent->miniq); > + if (created) > + dev_xtc_entry_update(dev, entry, false); > q->egress_block_info.binder_type = FLOW_BLOCK_BINDER_TYPE_CLSACT_EGRESS; > q->egress_block_info.chain_head_change = clsact_chain_head_change; > @@ -247,9 +275,21 @@ static int clsact_init(struct Qdisc *sch, struct > nlattr *opt, > static void clsact_destroy(struct Qdisc *sch) > { > struct clsact_sched_data *q = qdisc_priv(sch); > + struct net_device *dev = qdisc_dev(sch); > + struct xtc_entry *ingress_entry = rtnl_dereference(dev->xtc_ingress); > + struct xtc_entry *egress_entry = rtnl_dereference(dev->xtc_egress); > tcf_block_put_ext(q->egress_block, sch, &q->egress_block_info); > + if (egress_entry && dev_xtc_entry_total(egress_entry) == 0) { > + dev_xtc_entry_update(dev, NULL, false); > + dev_xtc_entry_free(egress_entry); > + } > + > tcf_block_put_ext(q->ingress_block, sch, &q->ingress_block_info); > + if (ingress_entry && dev_xtc_entry_total(ingress_entry) == 0) { > + dev_xtc_entry_update(dev, NULL, true); > + dev_xtc_entry_free(ingress_entry); > + } > net_dec_ingress_queue(); > net_dec_egress_queue(); > diff --git a/tools/include/uapi/linux/bpf.h > b/tools/include/uapi/linux/bpf.h > index 51b9aa640ad2..de1f5546bcfe 100644 > --- a/tools/include/uapi/linux/bpf.h > +++ b/tools/include/uapi/linux/bpf.h > @@ -1025,6 +1025,8 @@ enum bpf_attach_type { > BPF_PERF_EVENT, > BPF_TRACE_KPROBE_MULTI, > BPF_LSM_CGROUP, > + BPF_NET_INGRESS, > + BPF_NET_EGRESS, > __MAX_BPF_ATTACH_TYPE > }; > @@ -1399,14 +1401,20 @@ union bpf_attr { > }; > struct { /* anonymous struct used by BPF_PROG_ATTACH/DETACH commands */ > - __u32 target_fd; /* container object to attach to */ > + union { > + __u32 target_fd; /* container object to attach to */ > + __u32 target_ifindex; /* target ifindex */ > + }; > __u32 attach_bpf_fd; /* eBPF program to attach */ > __u32 attach_type; > __u32 attach_flags; > - __u32 replace_bpf_fd; /* previously attached eBPF > + union { > + __u32 attach_priority; > + __u32 replace_bpf_fd; /* previously attached eBPF > * program to replace if > * BPF_F_REPLACE is used > */ > + }; > }; > struct { /* anonymous struct used by BPF_PROG_TEST_RUN command */ > @@ -1452,7 +1460,10 @@ union bpf_attr { > } info; > struct { /* anonymous struct used by BPF_PROG_QUERY command */ > - __u32 target_fd; /* container object to query */ > + union { > + __u32 target_fd; /* container object to query */ > + __u32 target_ifindex; /* target ifindex */ > + }; > __u32 attach_type; > __u32 query_flags; > __u32 attach_flags; > @@ -6038,6 +6049,19 @@ struct bpf_sock_tuple { > }; > }; > +/* (Simplified) user return codes for tc prog type. > + * A valid tc program must return one of these defined values. All other > + * return codes are reserved for future use. Must remain compatible with > + * their TC_ACT_* counter-parts. For compatibility in behavior, unknown > + * return codes are mapped to TC_NEXT. > + */ > +enum tc_action_base { > + TC_NEXT = -1, > + TC_PASS = 0, > + TC_DROP = 2, > + TC_REDIRECT = 7, > +}; > + > struct bpf_xdp_sock { > __u32 queue_id; > }; > @@ -6804,6 +6828,11 @@ struct bpf_flow_keys { > __be32 flow_label; > }; > +struct bpf_query_info { > + __u32 prog_id; > + __u32 prio; > +}; > + > struct bpf_func_info { > __u32 insn_off; > __u32 type_id; > -- > 2.34.1
Powered by blists - more mailing lists