lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Fri, 18 Aug 2023 21:39:20 +0000
From:   David Laight <David.Laight@...LAB.COM>
To:     'David Howells' <dhowells@...hat.com>
CC:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Al Viro <viro@...iv.linux.org.uk>,
        Jens Axboe <axboe@...nel.dk>, Christoph Hellwig <hch@...t.de>,
        Christian Brauner <christian@...uner.io>,
        Matthew Wilcox <willy@...radead.org>,
        Jeff Layton <jlayton@...nel.org>,
        "linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
        "linux-block@...r.kernel.org" <linux-block@...r.kernel.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH v3 2/2] iov_iter: Don't deal with iter->copy_mc in
 memcpy_from_iter_mc()

From: David Howells
> Sent: Friday, August 18, 2023 5:49 PM
> 
> David Laight <David.Laight@...LAB.COM> wrote:
> 
> > > iov_iter_init                            inc 0x27 -> 0x31 +0xa
> >
> > Are you hitting the gcc bug that loads the constant from memory?
> 
> I'm not sure what that looks like.  For your perusal, here's a disassembly of
> the use-switch-on-enum variant:
> 
>    0xffffffff8177726c <+0>:     cmp    $0x1,%esi
>    0xffffffff8177726f <+3>:     jbe    0xffffffff81777273 <iov_iter_init+7>
>    0xffffffff81777271 <+5>:     ud2
>    0xffffffff81777273 <+7>:     test   %esi,%esi
>    0xffffffff81777275 <+9>:     movw   $0x1,(%rdi)
>    0xffffffff8177727a <+14>:    setne  0x3(%rdi)
>    0xffffffff8177727e <+18>:    xor    %eax,%eax
>    0xffffffff81777280 <+20>:    movb   $0x0,0x2(%rdi)
>    0xffffffff81777284 <+24>:    movb   $0x1,0x4(%rdi)
>    0xffffffff81777288 <+28>:    mov    %rax,0x8(%rdi)
>    0xffffffff8177728c <+32>:    mov    %rdx,0x10(%rdi)
>    0xffffffff81777290 <+36>:    mov    %r8,0x18(%rdi)
>    0xffffffff81777294 <+40>:    mov    %rcx,0x20(%rdi)
>    0xffffffff81777298 <+44>:    jmp    0xffffffff81d728a0 <__x86_return_thunk>
> 
> versus the use-bitmap variant:
> 
>    0xffffffff81777311 <+0>:     cmp    $0x1,%esi
>    0xffffffff81777314 <+3>:     jbe    0xffffffff81777318 <iov_iter_init+7>
>    0xffffffff81777316 <+5>:     ud2
>    0xffffffff81777318 <+7>:     test   %esi,%esi
>    0xffffffff8177731a <+9>:     movb   $0x2,(%rdi)
>    0xffffffff8177731d <+12>:    setne  0x1(%rdi)
>    0xffffffff81777321 <+16>:    xor    %eax,%eax
>    0xffffffff81777323 <+18>:    mov    %rdx,0x10(%rdi)
>    0xffffffff81777327 <+22>:    mov    %rax,0x8(%rdi)
>    0xffffffff8177732b <+26>:    mov    %r8,0x18(%rdi)
>    0xffffffff8177732f <+30>:    mov    %rcx,0x20(%rdi)
>    0xffffffff81777333 <+34>:    jmp    0xffffffff81d72960 <__x86_return_thunk>
> 
> It seems to be that the former is loading byte constants individually, whereas
> Linus combined all those fields into a single byte and eliminated one of them.

I think you need to re-order the structure.
The top set writes to bytes 0..4 with:
>    0xffffffff81777275 <+9>:     movw   $0x1,(%rdi)
>    0xffffffff8177727a <+14>:    setne  0x3(%rdi)
>    0xffffffff81777280 <+20>:    movb   $0x0,0x2(%rdi)
>    0xffffffff81777284 <+24>:    movb   $0x1,0x4(%rdi)
Note that the 'setne' writes into the middle of the constants.

The lower writes bytes 0..1 with:
>    0xffffffff8177731a <+9>:     movb   $0x2,(%rdi)
>    0xffffffff8177731d <+12>:    setne  0x1(%rdi)

I think that if you move the 'conditional' value to offset 4
you'll get fewer writes.
Probably a 32bit load into %eax and then a write.

I don't think gcc likes generating 16bit immediates.
In some tests I did it loaded a 32bit value into %eax
and then wrote the low bits.
So the code is much the same (on x86) for 2 or 4 bytes
of constants.
I'm sure you can use the 'data-16' prefix with an immediate.

I'm not sure why you have two non-zero values when Linus
only had one though.

OTOH you don't want to be writing 3 bytes of constants.
Also gcc won't generate:
	movl $0xaabbccdd,%eax
	setne %al   // overwriting the dd
	movl %eax,(%rdi)
and I suspect the partial write (to %al) will be a stall.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ