netdev - Re: [PATCH net-next v2 2/2] net: sched: Lockless Token Bucket (LTB) qdisc

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <825c8af6-66b5-eaf4-2c46-76d018489ebd@gmail.com>
Date:   Thu, 9 Jul 2020 10:15:45 -0700
From:   Eric Dumazet <eric.dumazet@...il.com>
To:     "YU, Xiangning" <xiangning.yu@...baba-inc.com>,
        Eric Dumazet <eric.dumazet@...il.com>,
        Linux Kernel Network Developers <netdev@...r.kernel.org>
Subject: Re: [PATCH net-next v2 2/2] net: sched: Lockless Token Bucket (LTB)
 qdisc



On 7/9/20 10:04 AM, YU, Xiangning wrote:
> 
> 
> On 7/8/20 6:24 PM, Eric Dumazet wrote:
>>
>>
>> On 7/8/20 5:58 PM, YU, Xiangning wrote:
>>>
>>>
>>> On 7/8/20 5:08 PM, Eric Dumazet wrote:
>>>>
>>>>
>>>> On 7/8/20 4:59 PM, YU, Xiangning wrote:
>>>>
>>>>>
>>>>> Yes, we are touching a cache line here to make sure aggregation tasklet is scheduled immediately. In most cases it is a call to test_and_set_bit(). 
>>>>
>>>>
>>>> test_and_set_bit() is dirtying the cache line even if the bit is already set.
>>>>
>>>
>>> Yes. I do hope we can avoid this.
>>>
>>>>>
>>>>> We might be able to do some inline processing without tasklet here, still we need to make sure the aggregation won't run simultaneously on multiple CPUs. 
>>>>
>>>> I am actually surprised you can reach 8 Mpps with so many cache line bouncing around.
>>>>
>>>> If you replace the ltb qdisc with standard mq+pfifo_fast, what kind of throughput do you get ?
>>>>
>>>
>>> Just tried it using pktgen, we are far from baseline. I can get 13Mpps with 10 threads in my test setup.
>>
>> This is quite low performance.
>>
>> I suspect your 10 threads are sharing a smaller number of TX queues perhaps ?
>>
> 
> Thank you for the hint. Looks like pktgen only used the first 10 queues.
> 
> I fined tuned ltb to reach 10M pps with 10 threads last night. I can further push the limit. But we probably won't be able to get close to baseline. Rate limiting really brings a lot of headache, at least we are not burning CPUs to get this result.

Well, at Google we no longer have this issue.

We adopted EDT model, so that rate limiting can be done in eBPF, by simply adjusting skb->tstamp.

The qdisc is MQ + FQ.

Stanislas Fomichev will present this use case at netdev conference 

https://netdevconf.info/0x14/session.html?talk-replacing-HTB-with-EDT-and-BPF