lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 08 Jul 2020 05:24:19 +0800
From:   "YU, Xiangning" <xiangning.yu@...baba-inc.com>
To:     Cong Wang <xiyou.wangcong@...il.com>
Cc:     Linux Kernel Network Developers <netdev@...r.kernel.org>
Subject: Re: [PATCH net-next 2/2] net: sched: Lockless Token Bucket (LTB)
 Qdisc



On 7/7/20 1:06 PM, Cong Wang wrote:
> On Mon, Jul 6, 2020 at 1:34 PM YU, Xiangning
> <xiangning.yu@...baba-inc.com> wrote:
>>
>>
>>
>> On 7/6/20 1:10 PM, Cong Wang wrote:
>>> On Mon, Jul 6, 2020 at 11:11 AM YU, Xiangning
>>> <xiangning.yu@...baba-inc.com> wrote:
>>>> +static int ltb_enqueue(struct sk_buff *skb, struct Qdisc *sch, spinlock_t *root_lock,
>>>> +                      struct sk_buff **to_free)
>>>> +{
>>>> +       struct ltb_sched *ltb = qdisc_priv(sch);
>>>> +       struct ltb_pcpu_sched *pcpu_q;
>>>> +       struct ltb_class *cl;
>>>> +       struct ltb_pcpu_data *pcpu = this_cpu_ptr(ltb->pcpu_data);
>>>> +       int cpu;
>>>> +
>>>> +       cpu = smp_processor_id();
>>>> +       pcpu_q = qdisc_priv(pcpu->qdisc);
>>>> +       ltb_skb_cb(skb)->cpu = cpu;
>>>> +
>>>> +       cl = ltb_classify(sch, ltb, skb);
>>>> +       if (unlikely(!cl)) {
>>>> +               kfree_skb(skb);
>>>> +               return NET_XMIT_DROP;
>>>> +       }
>>>> +
>>>> +       pcpu->active = true;
>>>> +       if (unlikely(kfifo_put(&cl->aggr_queues[cpu], skb) == 0)) {
>>>> +               kfree_skb(skb);
>>>> +               atomic64_inc(&cl->stat_drops);
>>>> +               return NET_XMIT_DROP;
>>>> +       }
>>>
>>>
>>> How do you prevent out-of-order packets?
>>>
>>
>> Hi Cong,
>>
>> That's a good question. In theory there will be out of order packets during aggregation. While keep in mind this is per-class aggregation, and it runs at a high frequency, that the chance to have any left over skbs from the same TCP flow on many CPUs is low.
>>
>> Also, based on real deployment experience, we haven't observed an increased out of order events even under heavy work load.
>>
> 
> Yeah, but unless you always classify packets into proper flows, there
> is always a chance to generate out-of-order packets here, which
> means the default configuration is flawed.
>

The key is to avoid classifying packets from a same flow into different classes. So we use socket priority to classify packets. It's always going to be correctly classified.

Not sure what do you mean by default configuration. But we create a shadow class when the qdisc is created. Before any other classes are created, all packets from any flow will be classified to this same shadow class, there won't be any incorrect classified packets either. 

> 
>>>
>>>> +static int ltb_init(struct Qdisc *sch, struct nlattr *opt,
>>> ...
>>>> +       ltb->default_cls = ltb->shadow_cls; /* Default hasn't been created */
>>>> +       tasklet_init(&ltb->fanout_tasklet, ltb_fanout_tasklet,
>>>> +                    (unsigned long)ltb);
>>>> +
>>>> +       /* Bandwidth balancer, this logic can be implemented in user-land. */
>>>> +       init_waitqueue_head(&ltb->bwbalancer_wq);
>>>> +       ltb->bwbalancer_task =
>>>> +           kthread_create(ltb_bw_balancer_kthread, ltb, "ltb-balancer");
>>>> +       wake_up_process(ltb->bwbalancer_task);
>>>
>>> Creating a kthread for each qdisc doesn't look good. Why do you
>>> need a per-qdisc kthread or even a kernel thread at all?
>>>
>>
>> We moved the bandwidth sharing out of the critical data path, that's why we use a kernel thread to balance the current maximum bandwidth used by each class periodically.
>>
>> This part could be implemented at as timer. What's your suggestion?
> 
> I doubt you can use a timer, as you call rtnl_trylock() there.
> Why not use a delayed work?
> 

Thank you for the suggestion, I will replace it with a delayed work and send out the new patch.

- Xiangning

> Thanks.
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ