[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a8bb0319-0928-4687-9e9c-777c5860dbdd@csgroup.eu>
Date: Tue, 1 Sep 2020 19:13:00 +0200
From: Christophe Leroy <christophe.leroy@...roup.eu>
To: Christoph Hellwig <hch@....de>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Al Viro <viro@...iv.linux.org.uk>,
Michael Ellerman <mpe@...erman.id.au>, x86@...nel.org
Cc: linux-fsdevel@...r.kernel.org, linux-arch@...r.kernel.org,
linuxppc-dev@...ts.ozlabs.org, Kees Cook <keescook@...omium.org>,
linux-kernel@...r.kernel.org
Subject: Re: remove the last set_fs() in common code, and remove it for x86
and powerpc v2
Hi Christoph,
Le 27/08/2020 à 17:00, Christoph Hellwig a écrit :
> Hi all,
>
> this series removes the last set_fs() used to force a kernel address
> space for the uaccess code in the kernel read/write/splice code, and then
> stops implementing the address space overrides entirely for x86 and
> powerpc.
>
> The file system part has been posted a few times, and the read/write side
> has been pretty much unchanced. For splice this series drops the
> conversion of the seq_file and sysctl code to the iter ops, and thus loses
> the splice support for them. The reasons for that is that it caused a lot
> of churn for not much use - splice for these small files really isn't much
> of a win, even if existing userspace uses it. All callers I found do the
> proper fallback, but if this turns out to be an issue the conversion can
> be resurrected.
>
> Besides x86 and powerpc I plan to eventually convert all other
> architectures, although this will be a slow process, starting with the
> easier ones once the infrastructure is merged. The process to convert
> architectures is roughtly:
>
> (1) ensure there is no set_fs(KERNEL_DS) left in arch specific code
> (2) implement __get_kernel_nofault and __put_kernel_nofault
> (3) remove the arch specific address limitation functionality
>
> Changes since v1:
> - drop the patch to remove the non-iter ops for /dev/zero and
> /dev/null as they caused a performance regression
> - don't enable user access in __get_kernel on powerpc
> - xfail the set_fs() based lkdtm tests
>
> Diffstat:
>
I'm still sceptic with the results I get.
With 5.9-rc2:
root@...ippro:~# time dd if=/dev/zero of=/dev/null count=1M
1048576+0 records in
1048576+0 records out
536870912 bytes (512.0MB) copied, 5.585880 seconds, 91.7MB/s
real 0m 5.59s
user 0m 1.40s
sys 0m 4.19s
With your series:
root@...ippro:/tmp# time dd if=/dev/zero of=/dev/null count=1M
1048576+0 records in
1048576+0 records out
536870912 bytes (512.0MB) copied, 7.780540 seconds, 65.8MB/s
real 0m 7.79s
user 0m 2.12s
sys 0m 5.66s
Top of perf report of a standard perf record:
With 5.9-rc2:
20.31% dd [kernel.kallsyms] [k] __arch_clear_user
8.37% dd [kernel.kallsyms] [k] transfer_to_syscall
7.37% dd [kernel.kallsyms] [k] __fsnotify_parent
6.95% dd [kernel.kallsyms] [k] iov_iter_zero
5.72% dd [kernel.kallsyms] [k] new_sync_read
4.87% dd [kernel.kallsyms] [k] vfs_write
4.47% dd [kernel.kallsyms] [k] vfs_read
3.07% dd [kernel.kallsyms] [k] ksys_write
2.77% dd [kernel.kallsyms] [k] ksys_read
2.65% dd [kernel.kallsyms] [k] __fget_light
2.37% dd [kernel.kallsyms] [k] __fdget_pos
2.35% dd [kernel.kallsyms] [k] memset
1.53% dd [kernel.kallsyms] [k] rw_verify_area
1.52% dd [kernel.kallsyms] [k] read_iter_zero
With your series:
19.60% dd [kernel.kallsyms] [k] __arch_clear_user
10.92% dd [kernel.kallsyms] [k] iov_iter_zero
9.50% dd [kernel.kallsyms] [k] vfs_write
8.97% dd [kernel.kallsyms] [k] __fsnotify_parent
5.46% dd [kernel.kallsyms] [k] transfer_to_syscall
5.42% dd [kernel.kallsyms] [k] vfs_read
3.58% dd [kernel.kallsyms] [k] ksys_read
2.84% dd [kernel.kallsyms] [k] read_iter_zero
2.24% dd [kernel.kallsyms] [k] ksys_write
1.80% dd [kernel.kallsyms] [k] __fget_light
1.34% dd [kernel.kallsyms] [k] __fdget_pos
0.91% dd [kernel.kallsyms] [k] memset
0.91% dd [kernel.kallsyms] [k] rw_verify_area
Christophe
Powered by blists - more mailing lists