[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wj1WfFGxHs4k6pn5y6V8BYd3aqODCjqEmrTWP8XO78giw@mail.gmail.com>
Date: Thu, 17 Aug 2023 17:31:09 +0200
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: David Laight <David.Laight@...lab.com>
Cc: David Howells <dhowells@...hat.com>,
Al Viro <viro@...iv.linux.org.uk>,
Jens Axboe <axboe@...nel.dk>,
Christian Brauner <christian@...uner.io>,
Matthew Wilcox <willy@...radead.org>,
Jeff Layton <jlayton@...nel.org>,
"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
"linux-block@...r.kernel.org" <linux-block@...r.kernel.org>,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v3 2/2] iov_iter: Don't deal with iter->copy_mc in memcpy_from_iter_mc()
On Thu, 17 Aug 2023 at 17:16, David Laight <David.Laight@...lab.com> wrote:
>
> gcc for x86-64 make a pigs-breakfast when the bitfields are 'char'
> and loads the constant from memory using pc-relative access.
I think your godbolt tests must be with some other model than what the
kernel uses.
For example, for me, iov_iter_init generates
testl %esi, %esi # direction test
movb $1, (%rdi) # bitfields
movq $0, 8(%rdi)
movq %rdx, 16(%rdi)
movq %r8, 24(%rdi)
movq %rcx, 32(%rdi)
setne 1(%rdi) # set the direction byte
with my patch for me. Which is pretty much optimal.
*Without& the patch, I get
movzwl .LC1(%rip), %eax
testl %esi, %esi
movb $0, (%rdi)
movb $1, 4(%rdi)
movw %ax, 1(%rdi)
movq $0, 8(%rdi)
movq %rdx, 16(%rdi)
movq %r8, 24(%rdi)
movq %rcx, 32(%rdi)
setne 3(%rdi)
which is that disgusting "move two bytes from memory", and makes
absolutely no sense as a way to "write 2 zero bytes":
.LC1:
.byte 0
.byte 0
I think that's some odd gcc bug, actually.
> Otherwise pretty must all variants (with or without the bitfield)
> get initialised in a single write.
So there may well be some odd gcc code generation issue that is
triggered by the fact that we use an initializer to set those things,
and we then have two bytes (with my patch) followed by a hole, or
three bytes (without it) followed by a hole.
But that bitfield thing improves things at least for me. I think the
example code you used for godbolt is actually something else than what
the kernel does.
Linus
Powered by blists - more mailing lists