netdev - Re: [PATCH 0/2] net/sched: Add hardware specific counters to TC actions

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <C997079A-1449-4D38-84AC-FF6E57F66ED3@redhat.com>
Date:   Thu, 20 Sep 2018 09:14:08 +0200
From:   "Eelco Chaudron" <echaudro@...hat.com>
To:     "Jakub Kicinski" <jakub.kicinski@...ronome.com>
Cc:     "David Miller" <davem@...emloft.net>, netdev@...r.kernel.org,
        jhs@...atatu.com, xiyou.wangcong@...il.com, jiri@...nulli.us,
        simon.horman@...ronome.com,
        "Marcelo Ricardo Leitner" <mleitner@...hat.com>,
        louis.peens@...ronome.com, Paolo <pabeni@...hat.com>,
        "Davide Caratti" <dcaratti@...hat.com>
Subject: Re: [PATCH 0/2] net/sched: Add hardware specific counters to TC
 actions



On 29 Aug 2018, at 20:12, Jakub Kicinski wrote:

> On Wed, 29 Aug 2018 11:43:47 +0200, Eelco Chaudron wrote:
>> On 23 Aug 2018, at 20:14, Jakub Kicinski wrote:
>>
>>> On Mon, 20 Aug 2018 16:03:40 +0200, Eelco Chaudron wrote:
>>>> On 17 Aug 2018, at 13:27, Jakub Kicinski wrote:
>>>>> On Thu, 16 Aug 2018 14:02:44 +0200, Eelco Chaudron wrote:
>>>>>> On 11 Aug 2018, at 21:06, David Miller wrote:
>>>>>>
>>>>>>> From: Jakub Kicinski <jakub.kicinski@...ronome.com>
>>>>>>> Date: Thu, 9 Aug 2018 20:26:08 -0700
>>>>>>>
>>>>>>>> It is not immediately clear why this is needed.  The memory and
>>>>>>>> updating two sets of counters won't come for free, so perhaps a
>>>>>>>> stronger justification than troubleshooting is due? :S
>>>>>>>>
>>>>>>>> Netdev has counters for fallback vs forwarded traffic, so you'd
>>>>>>>> know
>>>>>>>> that traffic hits the SW datapath, plus the rules which are 
>>>>>>>> in_hw
>>>>>>>> will
>>>>>>>> most likely not match as of today for flower (assuming
>>>>>>>> correctness).
>>>>>>
>>>>>> I strongly believe that these counters are a requirement for a
>>>>>> mixed
>>>>>> software/hardware (flow) based forwarding environment. The global
>>>>>> counters will not help much here as you might have chosen to have
>>>>>> certain traffic forwarded by software.
>>>>>>
>>>>>> These counters are probably the only option you have to figure 
>>>>>> out
>>>>>> why
>>>>>> forwarding is not as fast as expected, and you want to blame the 
>>>>>> TC
>>>>>> offload NIC.
>>>>>
>>>>> The suggested debugging flow would be:
>>>>>  (1) check the global counter for fallback are incrementing;
>>>>>  (2) find a flow with high stats but no in_hw flag set.
>>>>>
>>>>> The in_hw indication should be sufficient in most cases (unless
>>>>> there
>>>>> are shared blocks between netdevs of different ASICs...).
>>>>
>>>> I guess the aim is to find miss behaving hardware, i.e. having the
>>>> in_hw
>>>> flag set, but flows still coming to the kernel.
>>>
>>> For misbehaving hardware in_hw will not work indeed.  Whether we 
>>> need
>>> these extra always-on stats for such use case could be debated :)
>>>
>>>>>>>> I'm slightly concerned about potential performance impact, 
>>>>>>>> would
>>>>>>>> you
>>>>>>>> be able to share some numbers for non-trivial number of flows
>>>>>>>> (100k
>>>>>>>> active?)?
>>>>>>>
>>>>>>> Agreed, features used for diagnostics cannot have a harmful
>>>>>>> penalty
>>>>>>> for fast path performance.
>>>>>>
>>>>>> Fast path performance is not affected as these counters are not
>>>>>> incremented there. They are only incremented by the nic driver 
>>>>>> when
>>>>>> they
>>>>>> gather their statistics from hardware.
>>>>>
>>>>> Not by much, you are adding state to performance-critical
>>>>> structures,
>>>>> though, for what is effectively debugging purposes.
>>>>>
>>>>> I was mostly talking about the HW offload stat updates (sorry for
>>>>> not
>>>>> being clear).
>>>>>
>>>>> We can have some hundreds of thousands active offloaded flows, 
>>>>> each
>>>>> of
>>>>> them can have multiple actions, and stats have to be updated
>>>>> multiple
>>>>> times per second and dumped probably around once a second, too.  
>>>>> On
>>>>> a
>>>>> busy system the stats will get evicted from cache between each
>>>>> round.
>>>>>
>>>>> But I'm speculating let's see if I can get some numbers on it (if
>>>>> you
>>>>> could get some too, that would be great!).
>>>>
>>>> I’ll try to measure some of this later this week/early next week.
>>>
>>> I asked Louis to run some tests while I'm travelling, and he reports
>>> that my worry about reporting the extra stats was unfounded.  Update
>>> function does not show up in traces at all.  It seems under stress
>>> (generated with stress-ng) the thread dumping the stats in userspace
>>> (in OvS it would be the revalidator) actually consumes less CPU in
>>> __gnet_stats_copy_basic (0.4% less for ~2.0% total).
>>>
>>> Would this match with your results?  I'm not sure why dumping would 
>>> be
>>> faster with your change..
>>
>> Tested with OVS and https://github.com/chaudron/ovs_perf using 300K 
>> TC
>> rules installed in HW.
>>
>> For __gnet_stats_copy_basic() being faster I have (had) a theory. Now
>> this function is called twice, and I assumed the first call would 
>> cache
>> memory and the second call would be faster.
>>
>> Sampling a lot of perf data, I get an average of 1115ns with the base
>> kernel and 954ns with the fix applied, so about ~14%.
>>
>> Thought I would perf tcf_action_copy_stats() as it is the place 
>> updating
>> the additional counter. But even in this case, I see a better
>> performance with the patch applied.
>>
>> In average 13581ns with the fix, vs base kernel at 1391ns, so about
>> 2.3%.
>>
>> I guess the changes to the tc_action structure got better cache
>> alignment.
>
> Interesting you could reproduce the speed up too!  +1 for the guess.
> Seems like my caution about slowing down SW paths to support HW 
> offload
> landed on a very unfortunate patch :)

Is there anything else blocking from getting this into net-next?

I still think this patch is beneficial for the full user experience, and 
I’ve got requests from QA and others for this.