lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <825c8af6-66b5-eaf4-2c46-76d018489ebd@gmail.com>
Date:   Thu, 9 Jul 2020 10:15:45 -0700
From:   Eric Dumazet <eric.dumazet@...il.com>
To:     "YU, Xiangning" <xiangning.yu@...baba-inc.com>,
        Eric Dumazet <eric.dumazet@...il.com>,
        Linux Kernel Network Developers <netdev@...r.kernel.org>
Subject: Re: [PATCH net-next v2 2/2] net: sched: Lockless Token Bucket (LTB)
 qdisc



On 7/9/20 10:04 AM, YU, Xiangning wrote:
> 
> 
> On 7/8/20 6:24 PM, Eric Dumazet wrote:
>>
>>
>> On 7/8/20 5:58 PM, YU, Xiangning wrote:
>>>
>>>
>>> On 7/8/20 5:08 PM, Eric Dumazet wrote:
>>>>
>>>>
>>>> On 7/8/20 4:59 PM, YU, Xiangning wrote:
>>>>
>>>>>
>>>>> Yes, we are touching a cache line here to make sure aggregation tasklet is scheduled immediately. In most cases it is a call to test_and_set_bit(). 
>>>>
>>>>
>>>> test_and_set_bit() is dirtying the cache line even if the bit is already set.
>>>>
>>>
>>> Yes. I do hope we can avoid this.
>>>
>>>>>
>>>>> We might be able to do some inline processing without tasklet here, still we need to make sure the aggregation won't run simultaneously on multiple CPUs. 
>>>>
>>>> I am actually surprised you can reach 8 Mpps with so many cache line bouncing around.
>>>>
>>>> If you replace the ltb qdisc with standard mq+pfifo_fast, what kind of throughput do you get ?
>>>>
>>>
>>> Just tried it using pktgen, we are far from baseline. I can get 13Mpps with 10 threads in my test setup.
>>
>> This is quite low performance.
>>
>> I suspect your 10 threads are sharing a smaller number of TX queues perhaps ?
>>
> 
> Thank you for the hint. Looks like pktgen only used the first 10 queues.
> 
> I fined tuned ltb to reach 10M pps with 10 threads last night. I can further push the limit. But we probably won't be able to get close to baseline. Rate limiting really brings a lot of headache, at least we are not burning CPUs to get this result.

Well, at Google we no longer have this issue.

We adopted EDT model, so that rate limiting can be done in eBPF, by simply adjusting skb->tstamp.

The qdisc is MQ + FQ.

Stanislas Fomichev will present this use case at netdev conference 

https://netdevconf.info/0x14/session.html?talk-replacing-HTB-with-EDT-and-BPF

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ