[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f2e31c89-dd9e-f0f8-ef5c-e930d01a3b65@csgroup.eu>
Date: Wed, 19 Aug 2020 09:16:59 +0200
From: Christophe Leroy <christophe.leroy@...roup.eu>
To: Christoph Hellwig <hch@....de>
Cc: linux-arch@...r.kernel.org, Kees Cook <keescook@...omium.org>,
x86@...nel.org, linux-kernel@...r.kernel.org,
Al Viro <viro@...iv.linux.org.uk>,
linux-fsdevel@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org
Subject: Re: remove the last set_fs() in common code, and remove it for x86
and powerpc
Le 18/08/2020 à 20:23, Christophe Leroy a écrit :
>
>
> Le 18/08/2020 à 20:05, Christoph Hellwig a écrit :
>> On Tue, Aug 18, 2020 at 07:46:22PM +0200, Christophe Leroy wrote:
>>> I gave it a go on my powerpc mpc832x. I tested it on top of my newest
>>> series that reworks the 32 bits signal handlers (see
>>> https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=196278) with
>>>
>>> the microbenchmark test used is that series.
>>>
>>> With KUAP activated, on top of signal32 rework, performance is
>>> boosted as
>>> system time for the microbenchmark goes from 1.73s down to 1.56s,
>>> that is
>>> 10% quicker
>>>
>>> Surprisingly, with the kernel as is today without my signal's series,
>>> your
>>> series degrades performance slightly (from 2.55s to 2.64s ie 3.5%
>>> slower).
>>>
>>>
>>> I also observe, in both cases, a degradation on
>>>
>>> dd if=/dev/zero of=/dev/null count=1M
>>>
>>> Without your series, it runs in 5.29 seconds.
>>> With your series, it runs in 5.82 seconds, that is 10% more time.
>>
>> That's pretty strage, I wonder if some kernel text cache line
>> effects come into play here?
>>
>> The kernel access side is only used in slow path code, so it should
>> not make a difference, and the uaccess code is simplified and should be
>> (marginally) faster.
>>
>> Btw, was this with the __{get,put}_user_allowed cockup that you noticed
>> fixed?
>>
>
> Yes it is with the __get_user_size() replaced by __get_user_size_allowed().
I made a test with only the first patch of your series: That's
definitely the culprit. With only that patch applies, the duration is
6.64 seconds, that's a 25% degradation.
A perf record provides the following without the patch:
41.91% dd [kernel.kallsyms] [k] __arch_clear_user
7.02% dd [kernel.kallsyms] [k] vfs_read
6.86% dd [kernel.kallsyms] [k] new_sync_read
6.68% dd [kernel.kallsyms] [k] iov_iter_zero
6.03% dd [kernel.kallsyms] [k] transfer_to_syscall
3.39% dd [kernel.kallsyms] [k] memset
3.07% dd [kernel.kallsyms] [k] __fsnotify_parent
2.68% dd [kernel.kallsyms] [k] ksys_read
2.09% dd [kernel.kallsyms] [k] read_iter_zero
2.01% dd [kernel.kallsyms] [k] __fget_light
1.84% dd [kernel.kallsyms] [k] __fdget_pos
1.35% dd [kernel.kallsyms] [k] rw_verify_area
1.32% dd libc-2.23.so [.] __GI___libc_write
1.21% dd [kernel.kallsyms] [k] vfs_write
...
0.03% dd [kernel.kallsyms] [k] write_null
And the following with the patch:
15.54% dd [kernel.kallsyms] [k] __arch_clear_user
9.17% dd [kernel.kallsyms] [k] vfs_read
6.54% dd [kernel.kallsyms] [k] new_sync_write
6.31% dd [kernel.kallsyms] [k] transfer_to_syscall
6.29% dd [kernel.kallsyms] [k] __fsnotify_parent
6.20% dd [kernel.kallsyms] [k] new_sync_read
5.47% dd [kernel.kallsyms] [k] memset
5.13% dd [kernel.kallsyms] [k] vfs_write
4.44% dd [kernel.kallsyms] [k] iov_iter_zero
2.95% dd [kernel.kallsyms] [k] write_iter_null
2.82% dd [kernel.kallsyms] [k] ksys_read
2.46% dd [kernel.kallsyms] [k] __fget_light
2.34% dd libc-2.23.so [.] __GI___libc_read
1.89% dd [kernel.kallsyms] [k] iov_iter_advance
1.76% dd [kernel.kallsyms] [k] __fdget_pos
1.65% dd [kernel.kallsyms] [k] rw_verify_area
1.63% dd [kernel.kallsyms] [k] read_iter_zero
1.60% dd [kernel.kallsyms] [k] iov_iter_init
1.22% dd [kernel.kallsyms] [k] ksys_write
1.14% dd libc-2.23.so [.] __GI___libc_write
Christophe
>
> Christophe
Powered by blists - more mailing lists