netdev - Re: [PATCH 0/2] net/sched: Add hardware specific counters to TC actions

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <229BA7FA-916B-47EA-8FD4-3F0B8BDDD145@redhat.com>
Date:   Mon, 20 Aug 2018 16:03:40 +0200
From:   "Eelco Chaudron" <echaudro@...hat.com>
To:     "Jakub Kicinski" <jakub.kicinski@...ronome.com>
Cc:     "David Miller" <davem@...emloft.net>, netdev@...r.kernel.org,
        jhs@...atatu.com, xiyou.wangcong@...il.com, jiri@...nulli.us,
        simon.horman@...ronome.com,
        "Marcelo Ricardo Leitner" <mleitner@...hat.com>
Subject: Re: [PATCH 0/2] net/sched: Add hardware specific counters to TC
 actions



On 17 Aug 2018, at 13:27, Jakub Kicinski wrote:

> On Thu, 16 Aug 2018 14:02:44 +0200, Eelco Chaudron wrote:
>> On 11 Aug 2018, at 21:06, David Miller wrote:
>>
>>> From: Jakub Kicinski <jakub.kicinski@...ronome.com>
>>> Date: Thu, 9 Aug 2018 20:26:08 -0700
>>>
>>>> It is not immediately clear why this is needed.  The memory and
>>>> updating two sets of counters won't come for free, so perhaps a
>>>> stronger justification than troubleshooting is due? :S
>>>>
>>>> Netdev has counters for fallback vs forwarded traffic, so you'd 
>>>> know
>>>> that traffic hits the SW datapath, plus the rules which are in_hw
>>>> will
>>>> most likely not match as of today for flower (assuming 
>>>> correctness).
>>
>> I strongly believe that these counters are a requirement for a mixed
>> software/hardware (flow) based forwarding environment. The global
>> counters will not help much here as you might have chosen to have
>> certain traffic forwarded by software.
>>
>> These counters are probably the only option you have to figure out 
>> why
>> forwarding is not as fast as expected, and you want to blame the TC
>> offload NIC.
>
> The suggested debugging flow would be:
>  (1) check the global counter for fallback are incrementing;
>  (2) find a flow with high stats but no in_hw flag set.
>
> The in_hw indication should be sufficient in most cases (unless there
> are shared blocks between netdevs of different ASICs...).

I guess the aim is to find miss behaving hardware, i.e. having the in_hw 
flag set, but flows still coming to the kernel.

>>>> I'm slightly concerned about potential performance impact, would 
>>>> you
>>>> be able to share some numbers for non-trivial number of flows (100k
>>>> active?)?
>>>
>>> Agreed, features used for diagnostics cannot have a harmful penalty
>>> for fast path performance.
>>
>> Fast path performance is not affected as these counters are not
>> incremented there. They are only incremented by the nic driver when 
>> they
>> gather their statistics from hardware.
>
> Not by much, you are adding state to performance-critical structures,
> though, for what is effectively debugging purposes.
>
> I was mostly talking about the HW offload stat updates (sorry for not
> being clear).
>
> We can have some hundreds of thousands active offloaded flows, each of
> them can have multiple actions, and stats have to be updated multiple
> times per second and dumped probably around once a second, too.  On a
> busy system the stats will get evicted from cache between each round.
>
> But I'm speculating let's see if I can get some numbers on it (if you
> could get some too, that would be great!).

I’ll try to measure some of this later this week/early next week.

>> However, the flow creation is effected, as this is where the extra
>> memory gets allocated. I had done some 40K flow tests before and did 
>> not
>> see any noticeable change in flow insertion performance. As requested 
>> by
>> Jakub I did it again for 100K (and threw a Netronome blade in the mix
>> ;). I used Marcelo’s test tool,
>> https://github.com/marceloleitner/perf-flower.git.
>>
>> Here are the numbers (time in seconds) for 10 runs in sorted order:
>>
>> +-------------+----------------+
>> | Base_kernel | Change_applied |
>> +-------------+----------------+
>> |    5.684019 |       5.656388 |
>> |    5.699658 |       5.674974 |
>> |    5.725220 |       5.722107 |
>> |    5.739285 |       5.839855 |
>> |    5.748088 |       5.865238 |
>> |    5.766231 |       5.873913 |
>> |    5.842264 |       5.909259 |
>> |    5.902202 |       5.912685 |
>> |    5.905391 |       5.947138 |
>> |    6.032997 |       5.997779 |
>> +-------------+----------------+
>>
>> I guess the deviation is in the userspace part, which is where in 
>> real
>> life flows get added anyway.
>>
>> Let me know if more is unclear.