lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <dfa52acc-f95c-5e7e-b84b-b54b2903ac9f@mojatatu.com>
Date:   Sun, 4 Dec 2022 20:13:07 -0300
From:   Pedro Tammela <pctammela@...atatu.com>
To:     Jamal Hadi Salim <jhs@...atatu.com>,
        Paolo Abeni <pabeni@...hat.com>
Cc:     Pedro Tammela <pctammela@...il.com>, netdev@...r.kernel.org,
        davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org,
        xiyou.wangcong@...il.com, jiri@...nulli.us, kuniyu@...zon.com
Subject: Re: [PATCH net-next v2 0/3] net/sched: retpoline wrappers for tc

On 01/12/2022 09:34, Jamal Hadi Salim wrote:
> On Thu, Dec 1, 2022 at 6:05 AM Paolo Abeni <pabeni@...hat.com> wrote:
>>
>> On Mon, 2022-11-28 at 12:44 -0300, Pedro Tammela wrote:
> 
> [..]
> 
>>> We observed a 3-6% speed up on the retpoline CPUs, when going through 1
>>> tc filter,
>>
>> Do yu have all the existing filters enabled at build time in your test
>> kernel? the reported figures are quite higher then expected considering
>> there are 7th new unlikely branch in between.
>>
> 
> That can be validated with a test that compiles a kernel with a filter under
> test listed first then another kernel with the same filter last.
> 
> Also given these tests were using 64B pkts to achieve the highest pps, perhaps
> using MTU sized pkts with pktgen would give more realistic results?
> 
> In addition to the tests for 1 and 100 filters...
> 
>> Also it would be nice to have some figure for the last filter in the if
>> chain. I fear we could have some regressions there even for 'retpoline'
>> CPUs - given the long if chain - and u32 is AFAIK (not much actually)
>> still quite used.
>>
> 
> I would say flower and bpf + u32 are probably the highest used,
> but given no available data on this usage beauty is in the eye of
> the beholder. I hope it doesnt become a real estate battle like we
> have in which subsystem gets to see packets first or last ;->
> 
>> Finally, it looks like the filter order in patch 1/3 is quite relevant,
>> and it looks like you used the lexicographic order, I guess it should
>> be better to sort them by 'relevance', if someone could provide a
>> reasonable 'relevance' order. I personally would move ife, ipt and
>> simple towards the bottom.
> 
> I think we can come up with some reasonable order.
> 
> cheers,
> jamal

We got a new system with a 7950x and I had some free time today to test 
out the classifier order with v3, which I will post soon.

64b pps:
baseline - 5914980
first - 6397116 (+8.15%)
last - 6362476 (+7.5%)

1500b pps:
baseline - 6367965
first - 6754578 (+6.07%)
last - 6745576 (+5.9%)

The difference between first to last is minimal, but it exists.
DDR5 seems to give a nice boost on pps for this test, when compared to 
the 5950x. Which makes sense, since it's quite heavy on the memory 
allocator.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ