lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a8bb0319-0928-4687-9e9c-777c5860dbdd@csgroup.eu>
Date:   Tue, 1 Sep 2020 19:13:00 +0200
From:   Christophe Leroy <christophe.leroy@...roup.eu>
To:     Christoph Hellwig <hch@....de>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Al Viro <viro@...iv.linux.org.uk>,
        Michael Ellerman <mpe@...erman.id.au>, x86@...nel.org
Cc:     linux-fsdevel@...r.kernel.org, linux-arch@...r.kernel.org,
        linuxppc-dev@...ts.ozlabs.org, Kees Cook <keescook@...omium.org>,
        linux-kernel@...r.kernel.org
Subject: Re: remove the last set_fs() in common code, and remove it for x86
 and powerpc v2

Hi Christoph,

Le 27/08/2020 à 17:00, Christoph Hellwig a écrit :
> Hi all,
> 
> this series removes the last set_fs() used to force a kernel address
> space for the uaccess code in the kernel read/write/splice code, and then
> stops implementing the address space overrides entirely for x86 and
> powerpc.
> 
> The file system part has been posted a few times, and the read/write side
> has been pretty much unchanced.  For splice this series drops the
> conversion of the seq_file and sysctl code to the iter ops, and thus loses
> the splice support for them.  The reasons for that is that it caused a lot
> of churn for not much use - splice for these small files really isn't much
> of a win, even if existing userspace uses it.  All callers I found do the
> proper fallback, but if this turns out to be an issue the conversion can
> be resurrected.
> 
> Besides x86 and powerpc I plan to eventually convert all other
> architectures, although this will be a slow process, starting with the
> easier ones once the infrastructure is merged.  The process to convert
> architectures is roughtly:
> 
>   (1) ensure there is no set_fs(KERNEL_DS) left in arch specific code
>   (2) implement __get_kernel_nofault and __put_kernel_nofault
>   (3) remove the arch specific address limitation functionality
> 
> Changes since v1:
>   - drop the patch to remove the non-iter ops for /dev/zero and
>     /dev/null as they caused a performance regression
>   - don't enable user access in __get_kernel on powerpc
>   - xfail the set_fs() based lkdtm tests
> 
> Diffstat:
> 


I'm still sceptic with the results I get.

With 5.9-rc2:

root@...ippro:~# time dd if=/dev/zero of=/dev/null count=1M
1048576+0 records in
1048576+0 records out
536870912 bytes (512.0MB) copied, 5.585880 seconds, 91.7MB/s
real    0m 5.59s
user    0m 1.40s
sys     0m 4.19s


With your series:

root@...ippro:/tmp# time dd if=/dev/zero of=/dev/null count=1M
1048576+0 records in
1048576+0 records out
536870912 bytes (512.0MB) copied, 7.780540 seconds, 65.8MB/s
real    0m 7.79s
user    0m 2.12s
sys     0m 5.66s




Top of perf report of a standard perf record:

With 5.9-rc2:

     20.31%  dd       [kernel.kallsyms]  [k] __arch_clear_user
      8.37%  dd       [kernel.kallsyms]  [k] transfer_to_syscall
      7.37%  dd       [kernel.kallsyms]  [k] __fsnotify_parent
      6.95%  dd       [kernel.kallsyms]  [k] iov_iter_zero
      5.72%  dd       [kernel.kallsyms]  [k] new_sync_read
      4.87%  dd       [kernel.kallsyms]  [k] vfs_write
      4.47%  dd       [kernel.kallsyms]  [k] vfs_read
      3.07%  dd       [kernel.kallsyms]  [k] ksys_write
      2.77%  dd       [kernel.kallsyms]  [k] ksys_read
      2.65%  dd       [kernel.kallsyms]  [k] __fget_light
      2.37%  dd       [kernel.kallsyms]  [k] __fdget_pos
      2.35%  dd       [kernel.kallsyms]  [k] memset
      1.53%  dd       [kernel.kallsyms]  [k] rw_verify_area
      1.52%  dd       [kernel.kallsyms]  [k] read_iter_zero

With your series:
     19.60%  dd       [kernel.kallsyms]  [k] __arch_clear_user
     10.92%  dd       [kernel.kallsyms]  [k] iov_iter_zero
      9.50%  dd       [kernel.kallsyms]  [k] vfs_write
      8.97%  dd       [kernel.kallsyms]  [k] __fsnotify_parent
      5.46%  dd       [kernel.kallsyms]  [k] transfer_to_syscall
      5.42%  dd       [kernel.kallsyms]  [k] vfs_read
      3.58%  dd       [kernel.kallsyms]  [k] ksys_read
      2.84%  dd       [kernel.kallsyms]  [k] read_iter_zero
      2.24%  dd       [kernel.kallsyms]  [k] ksys_write
      1.80%  dd       [kernel.kallsyms]  [k] __fget_light
      1.34%  dd       [kernel.kallsyms]  [k] __fdget_pos
      0.91%  dd       [kernel.kallsyms]  [k] memset
      0.91%  dd       [kernel.kallsyms]  [k] rw_verify_area

Christophe

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ