lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 18 Aug 2023 15:42:05 +0000
From:   David Laight <David.Laight@...LAB.COM>
To:     'David Howells' <dhowells@...hat.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>
CC:     Al Viro <viro@...iv.linux.org.uk>, Jens Axboe <axboe@...nel.dk>,
        "Christoph Hellwig" <hch@...t.de>,
        Christian Brauner <christian@...uner.io>,
        "Matthew Wilcox" <willy@...radead.org>,
        Jeff Layton <jlayton@...nel.org>,
        "linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
        "linux-block@...r.kernel.org" <linux-block@...r.kernel.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH v3 2/2] iov_iter: Don't deal with iter->copy_mc in
 memcpy_from_iter_mc()

From: David Howells
> Sent: Friday, August 18, 2023 4:20 PM
> 
> Linus Torvalds <torvalds@...ux-foundation.org> wrote:
> 
> > > Although I'm not sure the bit-fields really help.
> > > There are 8 bytes at the start of the structure, might as well
> > > use them :-)
> >
> > Actuallyç I wrote the patch that way because it seems to improve code
> > generation.
> >
> > The bitfields are generally all set together as just plain one-time
> > constants at initialization time, and gcc sees that it's a full byte
> > write. And the reason 'data_source' is not a bitfield is that it's not
> > a constant at iov_iter init time (it's an argument to all the init
> > functions), so having that one as a separate byte at init time is good
> > for code generation when you don't need to mask bits or anything like
> > that.
> >
> > And once initialized, having things be dense and doing all the
> > compares with a bitwise 'and' instead of doing them as some value
> > compare again tends to generate good code.
> 
> Actually...  I said that switch(enum) seemed to generate suboptimal code...
> However, if the enum is renumbered such that the constants are in the same
> order as in the switch() it generates better code.

Hmmm.. the order of the switch labels really shouldn't matter.

The advantage of the if-chain is that you can optimise for
the most common case.

> So we want this order:
> 
> 	enum iter_type {
> 		ITER_UBUF,
> 		ITER_IOVEC,
> 		ITER_BVEC,
> 		ITER_KVEC,
> 		ITER_XARRAY,
> 		ITER_DISCARD,
> 	};

Will gcc actually code this version without pessimising it?

	if (likely(type <= ITER_IOVEC) {
		if (likely(type != ITER_IOVEC))
			iterate_ubuf();
		else
			iterate_iovec();
	} else if (likely(type) <= ITER_KVEC)) {
		if (type == ITER_KVEC)
			iterate_kvec();
		else
			iterate_bvec();
	} else if (type == ITER_XARRAY) {
		iterate_xarrar()
	} else {
		discard;
	}

But I bet you can't stop it replicating the compares.
(especially with the likely().

That has two mis-predicted (are they ever right!) branches in the
common user-copy versions and three in the common kernel ones.

In some architectures you might get the default 'fall through'
to the UBUF code if the branches aren't predictable.
But I believe current x86 cpu never do static prediction.
So you always lose :-)

...
> 	static inline bool user_backed_iter(const struct iov_iter *i)
> 	{
> 		return iter_is_ubuf(i) || iter_is_iovec(i);
> 	}
> 
> which gcc just changes into something like a "CMP $1" and a "JA".

That makes sense...

> Comparing Linus's bit patch (+ is better) to renumbering the switch (- is
> better):
> 
....
> iov_iter_init                            inc 0x27 -> 0x31 +0xa

Are you hitting the gcc bug that loads the constant from memory?

> I think there may be more savings to be made if I go and convert more of the
> functions to using switch().

Size isn't everything, the code needs to be optimised for the hot paths.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ