netdev - Re: [PATCH 0/2] net/sched: Add hardware specific counters to TC actions

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180823201446.3802e84b@cakuba.netronome.com>
Date:   Thu, 23 Aug 2018 20:14:46 +0200
From:   Jakub Kicinski <jakub.kicinski@...ronome.com>
To:     "Eelco Chaudron" <echaudro@...hat.com>
Cc:     "David Miller" <davem@...emloft.net>, netdev@...r.kernel.org,
        jhs@...atatu.com, xiyou.wangcong@...il.com, jiri@...nulli.us,
        simon.horman@...ronome.com,
        "Marcelo Ricardo Leitner" <mleitner@...hat.com>,
        louis.peens@...ronome.com
Subject: Re: [PATCH 0/2] net/sched: Add hardware specific counters to TC
 actions

On Mon, 20 Aug 2018 16:03:40 +0200, Eelco Chaudron wrote:
> On 17 Aug 2018, at 13:27, Jakub Kicinski wrote:
> > On Thu, 16 Aug 2018 14:02:44 +0200, Eelco Chaudron wrote:  
> >> On 11 Aug 2018, at 21:06, David Miller wrote:
> >>  
> >>> From: Jakub Kicinski <jakub.kicinski@...ronome.com>
> >>> Date: Thu, 9 Aug 2018 20:26:08 -0700
> >>>  
> >>>> It is not immediately clear why this is needed.  The memory and
> >>>> updating two sets of counters won't come for free, so perhaps a
> >>>> stronger justification than troubleshooting is due? :S
> >>>>
> >>>> Netdev has counters for fallback vs forwarded traffic, so you'd 
> >>>> know
> >>>> that traffic hits the SW datapath, plus the rules which are in_hw
> >>>> will
> >>>> most likely not match as of today for flower (assuming 
> >>>> correctness).  
> >>
> >> I strongly believe that these counters are a requirement for a mixed
> >> software/hardware (flow) based forwarding environment. The global
> >> counters will not help much here as you might have chosen to have
> >> certain traffic forwarded by software.
> >>
> >> These counters are probably the only option you have to figure out 
> >> why
> >> forwarding is not as fast as expected, and you want to blame the TC
> >> offload NIC.  
> >
> > The suggested debugging flow would be:
> >  (1) check the global counter for fallback are incrementing;
> >  (2) find a flow with high stats but no in_hw flag set.
> >
> > The in_hw indication should be sufficient in most cases (unless there
> > are shared blocks between netdevs of different ASICs...).  
> 
> I guess the aim is to find miss behaving hardware, i.e. having the in_hw 
> flag set, but flows still coming to the kernel.

For misbehaving hardware in_hw will not work indeed.  Whether we need
these extra always-on stats for such use case could be debated :)
 
> >>>> I'm slightly concerned about potential performance impact, would 
> >>>> you
> >>>> be able to share some numbers for non-trivial number of flows (100k
> >>>> active?)?  
> >>>
> >>> Agreed, features used for diagnostics cannot have a harmful penalty
> >>> for fast path performance.  
> >>
> >> Fast path performance is not affected as these counters are not
> >> incremented there. They are only incremented by the nic driver when 
> >> they
> >> gather their statistics from hardware.  
> >
> > Not by much, you are adding state to performance-critical structures,
> > though, for what is effectively debugging purposes.
> >
> > I was mostly talking about the HW offload stat updates (sorry for not
> > being clear).
> >
> > We can have some hundreds of thousands active offloaded flows, each of
> > them can have multiple actions, and stats have to be updated multiple
> > times per second and dumped probably around once a second, too.  On a
> > busy system the stats will get evicted from cache between each round.
> >
> > But I'm speculating let's see if I can get some numbers on it (if you
> > could get some too, that would be great!).  
> 
> I’ll try to measure some of this later this week/early next week.

I asked Louis to run some tests while I'm travelling, and he reports
that my worry about reporting the extra stats was unfounded.  Update
function does not show up in traces at all.  It seems under stress
(generated with stress-ng) the thread dumping the stats in userspace
(in OvS it would be the revalidator) actually consumes less CPU in
__gnet_stats_copy_basic (0.4% less for ~2.0% total).

Would this match with your results?  I'm not sure why dumping would be
faster with your change..

> >> However, the flow creation is effected, as this is where the extra
> >> memory gets allocated. I had done some 40K flow tests before and did 
> >> not
> >> see any noticeable change in flow insertion performance. As requested 
> >> by
> >> Jakub I did it again for 100K (and threw a Netronome blade in the mix
> >> ;). I used Marcelo’s test tool,
> >> https://github.com/marceloleitner/perf-flower.git.
> >>
> >> Here are the numbers (time in seconds) for 10 runs in sorted order:
> >>
> >> +-------------+----------------+
> >> | Base_kernel | Change_applied |
> >> +-------------+----------------+
> >> |    5.684019 |       5.656388 |
> >> |    5.699658 |       5.674974 |
> >> |    5.725220 |       5.722107 |
> >> |    5.739285 |       5.839855 |
> >> |    5.748088 |       5.865238 |
> >> |    5.766231 |       5.873913 |
> >> |    5.842264 |       5.909259 |
> >> |    5.902202 |       5.912685 |
> >> |    5.905391 |       5.947138 |
> >> |    6.032997 |       5.997779 |
> >> +-------------+----------------+
> >>
> >> I guess the deviation is in the userspace part, which is where in 
> >> real
> >> life flows get added anyway.
> >>
> >> Let me know if more is unclear.