[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <d8e62725-5597-7cc5-b862-db0cb5564bba@huawei.com>
Date: Wed, 29 Dec 2021 11:11:49 +0800
From: chenweilong <chenweilong@...wei.com>
To: Will Deacon <will@...nel.org>
CC: <catalin.marinas@....com>, <corbet@....net>,
<linux-kernel@...r.kernel.org>, <linux-doc@...r.kernel.org>
Subject: Re: [PATCH] cache: Workaround HiSilicon Taishan DC CVAU
On 2021/12/14 2:56, Will Deacon wrote:
> On Fri, Nov 26, 2021 at 05:11:39PM +0800, Weilong Chen wrote:
>> Taishan's L1/L2 cache is inclusive, and the data is consistent.
>> Any change of L1 does not require DC operation to brush CL in L1 to L2.
>> It's safe that don't clean data cache by address to point of unification.
>>
>> Without IDC featrue, kernel needs to flush icache as well as dcache,
>> causes performance degradation.
>>
>> The flaw refers to V110/V200 variant 1.
>>
>> Signed-off-by: Weilong Chen <chenweilong@...wei.com>
>> ---
>> Documentation/arm64/silicon-errata.rst | 2 ++
>> arch/arm64/Kconfig | 11 +++++++++
>> arch/arm64/include/asm/cputype.h | 2 ++
>> arch/arm64/kernel/cpu_errata.c | 32 ++++++++++++++++++++++++++
>> arch/arm64/tools/cpucaps | 1 +
>> 5 files changed, 48 insertions(+)
> Hmm. We don't usually apply optimisations for specific CPUs on arm64, simply
> because the diversity of CPUs out there means it quickly becomes a
> fragmented mess.
>
> Is this patch purely a performance improvement? If so, please can you
> provide some numbers in an attempt to justify it?
Yes,it's a performance improvement. I have a test program like this:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <errno.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/time.h>
int main()
{
void *tmp;
int len = 200 * 1024 * 1024;
struct timeval start, end;
int interval;
tmp = mmap(NULL, len, PROT_READ|PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if(tmp == MAP_FAILED) {
perror("mmap failed");
exit(errno);
}
memset(tmp, 0, len);
gettimeofday(&start, NULL);
if(mprotect(tmp, len, PROT_READ|PROT_EXEC)) {
perror("Couldn’t mprotect");
exit(errno);
}
gettimeofday(&end, NULL);
interval = 1000000*(end.tv_sec - start.tv_sec) + (end.tv_usec - start.tv_usec);
printf("interval = %fms\n", interval/1000.0);
}
Without this fix, the mprotect takes:
interval = 25.608000ms
And with this fix:
interval = 0.689000ms
Have better performance improvement.
If you think it is suitable, I will send a v2 patch as the original patch broken cpu hotplug checks.
>
> Thanks,
>
> Will
> .
Powered by blists - more mailing lists