lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a010e8dd-428b-c295-b2db-020d8cc698c5@huawei.com>
Date:   Mon, 10 Jan 2022 11:20:39 +0800
From:   He Ying <heying24@...wei.com>
To:     Marc Zyngier <maz@...nel.org>
CC:     <catalin.marinas@....com>, <will@...nel.org>,
        <mark.rutland@....com>, <marcan@...can.st>, <joey.gouly@....com>,
        <pcc@...gle.com>, <linux-arm-kernel@...ts.infradead.org>,
        <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] arm64: Make CONFIG_ARM64_PSEUDO_NMI macro wrap all the
 pseudo-NMI code

Hi Marc,

I'm just back from the weekend and sorry for the delayed reply.


在 2022/1/8 20:51, Marc Zyngier 写道:
> On Fri, 07 Jan 2022 08:55:36 +0000,
> He Ying <heying24@...wei.com> wrote:
>> Our product has been updating its kernel from 4.4 to 5.10 recently and
>> found a performance issue. We do a bussiness test called ARP test, which
>> tests the latency for a ping-pong packets traffic with a certain payload.
>> The result is as following.
>>
>>   - 4.4 kernel: avg = ~20s
>>   - 5.10 kernel (CONFIG_ARM64_PSEUDO_NMI is not set): avg = ~40s
>>
>> I have been just learning arm64 pseudo-NMI code and have a question,
>> why is the related code not wrapped by CONFIG_ARM64_PSEUDO_NMI?
>> I wonder if this brings some performance regression.
>>
>> First, I make this patch and then do the test again. Here's the result.
>>
>>   - 5.10 kernel with this patch not applied: avg = ~40s
>>   - 5.10 kernel with this patch applied: avg = ~23s
>>
>> Amazing! Note that all kernel is built with CONFIG_ARM64_PSEUDO_NMI not
>> set. It seems the pseudo-NMI feature actually brings some overhead to
>> performance event if CONFIG_ARM64_PSEUDO_NMI is not set.
>>
>> Furthermore, I find the feature also brings some overhead to vmlinux size.
>> I build 5.10 kernel with this patch applied or not while
>> CONFIG_ARM64_PSEUDO_NMI is not set.
>>
>>   - 5.10 kernel with this patch not applied: vmlinux size is 384060600 Bytes.
>>   - 5.10 kernel with this patch applied: vmlinux size is 383842936 Bytes.
>>
>> That means arm64 pseudo-NMI feature may bring ~200KB overhead to
>> vmlinux size.
>>
>> Above all, arm64 pseudo-NMI feature brings some overhead to vmlinux size
>> and performance even if config is not set. To avoid it, add macro control
>> all around the related code.
> This obviously attracted my attention, and I took this patch for a
> ride on 5.16-rc8 on a machine that doesn't support GICv3 NMIs to make
> sure that any extra code would only result in pure overhead.
>
> There was no measurable difference with this patch applied or not,
> with CONFIG_ARM64_PSEUDO_NMI selected or not for the workloads I tried
> (I/O heavy virtual machines, hackbench).
Our test is some kind of network test.
>
> Mark already asked a number of questions (test case, implementation,
> test on a modern kernel). Please provide as many detail as you
> possibly can, because such a regression really isn't expected, and
> doesn't show up on the systems I have at hand. Some profiling numbers
> could also be interesting, in case this is a result of a particular
> resource being thrashed (TLB, cache...).

I replied to Mark a few moments ago and provided as many details as I can.

You mentioned TLB and cache could be thrashed. How can we check this?

By using perf tools?

>
> Thanks,
>
> 	M.
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ