lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 6 Aug 2010 05:58:58 +0200
From:	Denys Vlasenko <vda.linux@...glemail.com>
To:	Michal Nazarewicz <mina86@...a86.com>
Cc:	linux-kernel@...r.kernel.org, m.nazarewicz@...sung.com,
	"Douglas W. Jones" <jones@...uiowa.edu>,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH 1/3] lib: vsprintf: optimised put_dec_trunc() and put_dec_full()

On Friday 06 August 2010 00:38, Michal Nazarewicz wrote:
> The put_dec_trunc() and put_dec_full() functions were based on
> a code optimised for processors with 8-bit ALU but even then
> they failed to satisfy the same constraints

"Failed"? Interesting wording. Yes, the code won't map easily
onto 8-bit ALU, for the simple reason Linux kernel
does not support any 8-bit CPUs, and by going to wider register
I was able to process 5 decimal digits at once, not 4.
It was done deliberately. It is not a "failure".

Your code isn't 8-bit ALU optimized either.

Do you think a bit of smear of previous code
would help your to be accepted?

> and in fact 
> required at least 16-bit ALU (because at least one number they
> operate in can take 9 bits).

Yes, as explained above.

> This version of those functions proposed by this patch goes
> further and uses the full capacity of a 32-bit ALU and instead
> of splitting the number into nibbles and operating on them it
> performs the obvious algorithm for base conversion expect it
> uses optimised code for dividing by ten (ie. no division is
> actually performed).

(1) "expect" is a typo
(2) No, _this_ patch does not eliminate division. Next one does.
Move this part of changelong to the next patch, where it belongs.


> + * Decimal conversion is by far the most typical, and is used for
> + * /proc and /sys data. This directly impacts e.g. top performance
> + * with many processes running.
> + *
> + * We optimize it for speed using ideas described at
> + * <http://www.cs.uiowa.edu/~jones/bcd/divide.html>.

Do you have author's permission to do it?
Document it in the comment please.


> + * '(num * 0xcccd) >> 19' is an approximation of 'num / 10' that gives
> + * correct results for num < 81920.  Because of this, we check at the
> + * beginning if we are dealing with a number that may cause trouble
> + * and if so, we make it smaller.

This comment needs to be moved to the code line where the opration
is performed.


> + * (As a minor note, all operands are always 16 bit so this function
> + * should work well on hardware that cannot multiply 32 bit numbers).
> + *
> + * (Previous a code based on

English is a bit broken in the line above.


-- 
vda
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ