lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 22 Oct 2019 14:52:37 +0000
From:   Vlad Buslov <vladbu@...lanox.com>
To:     Marcelo Ricardo Leitner <mleitner@...hat.com>
CC:     Vlad Buslov <vladbu@...lanox.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "jhs@...atatu.com" <jhs@...atatu.com>,
        "xiyou.wangcong@...il.com" <xiyou.wangcong@...il.com>,
        "jiri@...nulli.us" <jiri@...nulli.us>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "dcaratti@...hat.com" <dcaratti@...hat.com>,
        "pabeni@...hat.com" <pabeni@...hat.com>
Subject: Re: [PATCH net-next 00/13] Control action percpu counters allocation
 by netlink flag


On Tue 22 Oct 2019 at 17:35, Marcelo Ricardo Leitner <mleitner@...hat.com> wrote:
> On Tue, Oct 22, 2019 at 05:17:51PM +0300, Vlad Buslov wrote:
>> Currently, significant fraction of CPU time during TC filter allocation
>> is spent in percpu allocator. Moreover, percpu allocator is protected
>> with single global mutex which negates any potential to improve its
>> performance by means of recent developments in TC filter update API that
>> removed rtnl lock for some Qdiscs and classifiers. In order to
>> significantly improve filter update rate and reduce memory usage we
>> would like to allow users to skip percpu counters allocation for
>> specific action if they don't expect high traffic rate hitting the
>> action, which is a reasonable expectation for hardware-offloaded setup.
>> In that case any potential gains to software fast-path performance
>> gained by usage of percpu-allocated counters compared to regular integer
>> counters protected by spinlock are not important, but amount of
>> additional CPU and memory consumed by them is significant.
>
> Yes!
>
> I wonder how this can play together with conntrack offloading.  With
> it the sw datapath will be more used, as a conntrack entry can only be
> offloaded after the handshake.  That said, the host can have to
> process quite some handshakes in sw datapath.  Seems OvS can then just
> not set this flag in act_ct (and others for this rule), and such cases
> will be able to leverage the percpu stats.  Right?

The flag is set per each actions instance so client can chose not to use
the flag in case-by-case basis. Conntrack use case requires further
investigation since I'm not entirely convinced that handling first few
packets in sw (before connection reaches established state and is
offloaded) warrants having percpu counter.

>
>> allocator, but not for action idr lock, which is per-action. Note that
>> percpu allocator is still used by dst_cache in tunnel_key actions and
>> consumes 4.68% CPU time. Dst_cache seems like good opportunity for
>> further insertion rate optimization but is not addressed by this change.
>
> I vented this idea re dst_cache last week with Paolo. He sent me a
> draft patch but I didn't test it yet.

Looking forward to it!

>
> Thanks,
> Marcelo

Powered by blists - more mailing lists