lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAMDZJNWaREaM7=CZY=HCvxY0T1uHDsDH3QBwdbstSrNPXrbcdA@mail.gmail.com>
Date:   Sat, 19 Mar 2022 21:40:23 +0800
From:   Tonghao Zhang <xiangxia.m.yue@...il.com>
To:     Daniel Borkmann <daniel@...earbox.net>
Cc:     Paolo Abeni <pabeni@...hat.com>,
        Linux Kernel Network Developers <netdev@...r.kernel.org>,
        Jamal Hadi Salim <jhs@...atatu.com>,
        Cong Wang <xiyou.wangcong@...il.com>,
        Jiri Pirko <jiri@...nulli.us>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Jonathan Lemon <jonathan.lemon@...il.com>,
        Eric Dumazet <edumazet@...gle.com>,
        Alexander Lobakin <alobakin@...me>,
        Talal Ahmad <talalahmad@...gle.com>,
        Kevin Hao <haokexin@...il.com>,
        Alexei Starovoitov <ast@...nel.org>, bpf@...r.kernel.org
Subject: Re: [net-next v10 1/2] net: sched: use queue_mapping to pick tx queue

On Fri, Mar 18, 2022 at 9:36 PM Daniel Borkmann <daniel@...earbox.net> wrote:
>
> On 3/17/22 9:20 AM, Paolo Abeni wrote:
> > On Tue, 2022-03-15 at 20:48 +0800, Tonghao Zhang wrote:
> >> On Tue, Mar 15, 2022 at 5:59 AM Daniel Borkmann <daniel@...earbox.net> wrote:
> >>> On 3/14/22 3:15 PM, xiangxia.m.yue@...il.com wrote:
> >>> [...]
> >>>>    include/linux/netdevice.h |  3 +++
> >>>>    include/linux/rtnetlink.h |  1 +
> >>>>    net/core/dev.c            | 31 +++++++++++++++++++++++++++++--
> >>>>    net/sched/act_skbedit.c   |  6 +++++-
> >>>>    4 files changed, 38 insertions(+), 3 deletions(-)
> >>>>
> >>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> >>>> index 0d994710b335..f33fb2d6712a 100644
> >>>> --- a/include/linux/netdevice.h
> >>>> +++ b/include/linux/netdevice.h
> >>>> @@ -3065,6 +3065,9 @@ struct softnet_data {
> >>>>        struct {
> >>>>                u16 recursion;
> >>>>                u8  more;
> >>>> +#ifdef CONFIG_NET_EGRESS
> >>>> +             u8  skip_txqueue;
> >>>> +#endif
> >>>>        } xmit;
> >>>>    #ifdef CONFIG_RPS
> >>>>        /* input_queue_head should be written by cpu owning this struct,
> >>>> diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
> >>>> index 7f970b16da3a..ae2c6a3cec5d 100644
> >>>> --- a/include/linux/rtnetlink.h
> >>>> +++ b/include/linux/rtnetlink.h
> >>>> @@ -100,6 +100,7 @@ void net_dec_ingress_queue(void);
> >>>>    #ifdef CONFIG_NET_EGRESS
> >>>>    void net_inc_egress_queue(void);
> >>>>    void net_dec_egress_queue(void);
> >>>> +void netdev_xmit_skip_txqueue(bool skip);
> >>>>    #endif
> >>>>
> >>>>    void rtnetlink_init(void);
> >>>> diff --git a/net/core/dev.c b/net/core/dev.c
> >>>> index 75bab5b0dbae..8e83b7099977 100644
> >>>> --- a/net/core/dev.c
> >>>> +++ b/net/core/dev.c
> >>>> @@ -3908,6 +3908,25 @@ sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
> >>>>
> >>>>        return skb;
> >>>>    }
> >>>> +
> >>>> +static struct netdev_queue *
> >>>> +netdev_tx_queue_mapping(struct net_device *dev, struct sk_buff *skb)
> >>>> +{
> >>>> +     int qm = skb_get_queue_mapping(skb);
> >>>> +
> >>>> +     return netdev_get_tx_queue(dev, netdev_cap_txqueue(dev, qm));
> >>>> +}
> >>>> +
> >>>> +static bool netdev_xmit_txqueue_skipped(void)
> >>>> +{
> >>>> +     return __this_cpu_read(softnet_data.xmit.skip_txqueue);
> >>>> +}
> >>>> +
> >>>> +void netdev_xmit_skip_txqueue(bool skip)
> >>>> +{
> >>>> +     __this_cpu_write(softnet_data.xmit.skip_txqueue, skip);
> >>>> +}
> >>>> +EXPORT_SYMBOL_GPL(netdev_xmit_skip_txqueue);
> >>>>    #endif /* CONFIG_NET_EGRESS */
> >>>>
> >>>>    #ifdef CONFIG_XPS
> >>>> @@ -4078,7 +4097,7 @@ struct netdev_queue *netdev_core_pick_tx(struct net_device *dev,
> >>>>    static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
> >>>>    {
> >>>>        struct net_device *dev = skb->dev;
> >>>> -     struct netdev_queue *txq;
> >>>> +     struct netdev_queue *txq = NULL;
> >>>>        struct Qdisc *q;
> >>>>        int rc = -ENOMEM;
> >>>>        bool again = false;
> >>>> @@ -4106,11 +4125,17 @@ static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
> >>>>                        if (!skb)
> >>>>                                goto out;
> >>>>                }
> >>>> +
> >>>> +             netdev_xmit_skip_txqueue(false);
> >>>> +
> >>>>                nf_skip_egress(skb, true);
> >>>>                skb = sch_handle_egress(skb, &rc, dev);
> >>>>                if (!skb)
> >>>>                        goto out;
> >>>>                nf_skip_egress(skb, false);
> >>>> +
> >>>> +             if (netdev_xmit_txqueue_skipped())
> >>>> +                     txq = netdev_tx_queue_mapping(dev, skb);
> >>>>        }
> >>>>    #endif
> >>>>        /* If device/qdisc don't need skb->dst, release it right now while
> >>>> @@ -4121,7 +4146,9 @@ static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
> >>>>        else
> >>>>                skb_dst_force(skb);
> >>>>
> >>>> -     txq = netdev_core_pick_tx(dev, skb, sb_dev);
> >>>> +     if (likely(!txq))
> >>>
> >>> nit: Drop likely(). If the feature is used from sch_handle_egress(), then this would always be the case.
> >> Hi Daniel
> >> I think in most case, we don't use skbedit queue_mapping in the
> >> sch_handle_egress() , so I add likely in fast path.
>
> Yeah, but then let branch predictor do its work ? We can still change and drop the
> likely() once we add support for BPF though..
Hi
if you are ok that introducing the bpf helper shown below, I will drop
likely() in next patch.
>
> >>>> +             txq = netdev_core_pick_tx(dev, skb, sb_dev);
> >>>> +
> >>>>        q = rcu_dereference_bh(txq->qdisc);
> >>>
> >>> How will the `netdev_xmit_skip_txqueue(true)` be usable from BPF side (see bpf_convert_ctx_access() ->
> >>> queue_mapping)?
> >> Good questions, In other patch, I introduce the
> >> bpf_netdev_skip_txqueue, so we can use netdev_xmit_skip_txqueue in bpf
> >> side
>
> Yeah, that bpf_netdev_skip_txqueue() won't fly. It's basically a helper doing quirk for
> an implementation detail (aka calling netdev_xmit_skip_txqueue()). Was hoping you have
> something better we could use along with the context rewrite of __sk_buff's queue_mapping,
Hi Daniel
I review the bpf codes, we introduce a lot helper to change the skb field:
skb_change_proto
skb_change_type
skb_change_tail
skb_pull_data
skb_change_head
skb_ecn_set_ce
skb_cgroup_classid
skb_vlan_push
skb_set_tunnel_key

did you mean that, we introduce bpf_skb_set_queue_mapping  is better
than bpf_netdev_skip_txqueue.
for example:
BPF_CALL_2(bpf_skb_set_queue_mapping, struct sk_buff *, skb, u32, txq)
{
        skb->queue_mapping = txq;
        netdev_xmit_skip_txqueue(true);
        return 0;
};

> but worst case we need to rework a bit for BPF. :/
> Thanks,
> Daniel



-- 
Best regards, Tonghao

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ