linux-kernel - Re: [PATCH 03/15] print_integer: new and improved way of printing integers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200421163142.GA8735@avx2>
Date:   Tue, 21 Apr 2020 19:31:42 +0300
From:   Alexey Dobriyan <adobriyan@...il.com>
To:     Andy Shevchenko <andriy.shevchenko@...ux.intel.com>
Cc:     akpm@...ux-foundation.org, linux-kernel@...r.kernel.org,
        linux-fsdevel@...r.kernel.org, pmladek@...e.com,
        rostedt@...dmis.org, sergey.senozhatsky@...il.com,
        linux@...musvillemoes.dk
Subject: Re: [PATCH 03/15] print_integer: new and improved way of printing
 integers

On Tue, Apr 21, 2020 at 12:19:11AM +0300, Andy Shevchenko wrote:
> On Mon, Apr 20, 2020 at 11:57:31PM +0300, Alexey Dobriyan wrote:
> > Time honored way to print integers via vsnprintf() or equivalent has
> > unavoidable slowdown of parsing format string. This can't be fixed in C,
> > without introducing external preprocessor.
> > 
> > seq_put_decimal_ull() partially saves the day, but there are a lot of
> > branches inside and overcopying still.
> > 
> > _print_integer_*() family of functions is meant to make printing
> > integers as fast as possible by deleting format string parsing and doing
> > as little work as possible.
> > 
> > It is based on the following observations:
> > 
> > 1) memcpy is done in forward direction
> > 	it can be done backwards but nobody does that,
> > 
> > 2) digits can be extracted in a very simple loop which costs only
> > 	1 multiplication and shift (division by constant is not division)
> > 
> > All the above asks for the following signature, semantics and pattern of
> > printing out beloved /proc files:
> > 
> > 	/* seq_printf(seq, "%u %llu\n", A, b); */
> > 
> > 	char buf[10 + 1 + 20 + 1];
> > 	char *p = buf + sizeof(buf);
> > 
> > 	*--p = '\n';
> > 	p = _print_integer_u64(p, B);
> > 	*--p = ' ';
> > 	p = _print_integer_u32(p, A);
> > 
> > 	seq_write(seq, p, buf + sizeof(buf) - p);
> > 
> > 1) stack buffer capable of holding the biggest string is allocated.
> > 
> > 2) "p" is pointer to start of the string. Initially it points past
> > 	the end of the buffer WHICH IS NOT NUL-TERMINATED!
> > 
> > 3) _print_integer_*() actually prints an integer from right to left
> > 	and returns new start of the string.
> > 
> > 			     <--------|
> > 				123
> > 				^
> > 				|
> > 				+-- p
> > 
> > 4) 1 character is printed with
> > 
> > 	*--p = 'x';
> > 
> > 	It generates very efficient code as multiple writes can be
> > 	merged.
> > 
> > 5) fixed string is printed with
> > 
> > 	p = memcpy(p - 3, "foo", 3);
> > 
> > 	Complers know what memcpy() does and write-combine it.
> > 	4/8-byte writes become 1 instruction and are very efficient.
> > 
> > 6) Once everything is printed, the result is written to seq_file buffer.
> > 	It does only one overflow check and 1 copy.
> > 
> > This generates very efficient code (and small!).
> > 
> > In regular seq_printf() calls, first argument and format string are
> > constantly reloaded. Format string will most likely with [rip+...] which
> > is quite verbose.
> > 
> > seq_put_decimal_ull() will do branches (and even more branches
> > with "width" argument)
> > 
> 
> > 	TODO
> > 	benchmark with mainline because nouveau is broken for me -(
> > 	vsnprintf() changes make the code slower
> 
> Exactly main point of this exercise. I don't believe that algos in vsprintf.c
> are too dumb to use division per digit (yes, division by constant which is not
> power of two is a heavy operation).

It is not about division.

It is about fucktons of branches in vsprintf().

> > +noinline
> > +char *_print_integer_u32(char *p, u32 x)
> > +{
> > +	do {
> > +		*--p = '0' + (x % 10);
> > +	} while (x /= 10);
> > +	return p;
> > +}
> 
> > +noinline
> > +char *_print_integer_u64(char *p, u64 x)
> > +{
> > +	while (x >= 100 * 1000 * 1000) {
> > +		u32 r;
> > +
> > +		x = div_u64_rem(x, 100 * 1000 * 1000, &r);
> > +		p = memset(p - 8, '0', 8);
> > +		(void)_print_integer_u32(p + 8, r);
> > +	}
> > +	return _print_integer_u32(p, x);
> > +}
> 
> -- 
> With Best Regards,
> Andy Shevchenko
> 
>