lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4823774E.2040207@goop.org>
Date:	Thu, 08 May 2008 22:57:34 +0100
From:	Jeremy Fitzhardinge <jeremy@...p.org>
To:	Segher Boessenkool <segher@...nel.crashing.org>
CC:	LKML <linux-kernel@...r.kernel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Ingo Molnar <mingo@...e.hu>,
	Christian Kujau <lists@...dbynature.de>,
	Robert Hancock <hancockr@...w.ca>,
	john stultz <johnstul@...ibm.com>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH] common implementation of iterative div/mod

Segher Boessenkool wrote:
>> The workaround is to put a dummy asm statement in the loop to prevent
>> gcc from performing the transformation.
>
> It's not a "dummy" asm, it actually does something: it tells the
> compiler that it has to iterate the loop exactly as written, and
> not do something else.  I.e., exactly the behaviour we want here.

No, it has no function of its own.  It's bullying gcc into not 
performing an optimisation by giving the impression its doing something.

>> +    ticks = iter_div_u64_rem(blocked, NS_PER_TICK, &blocked);
>
> What a terrible function name.

It's consistent with the other functions defined here.  I agree it isn't 
pretty.  If you have a better suggestion, I'm all ears.

>> static inline void timespec_add_ns(struct timespec *a, u64 ns)
>> {
>> -    ns += a->tv_nsec;
>> -    while(unlikely(ns >= NSEC_PER_SEC)) {
>> -        /* The following asm() prevents the compiler from
>> -         * optimising this loop into a modulo operation.  */
>> -        asm("" : "+r"(ns));
>> -
>> -        ns -= NSEC_PER_SEC;
>> -        a->tv_sec++;
>> -    }
>> +    a->tv_sec += iter_div_u64_rem(a->tv_nsec + ns, NSEC_PER_SEC, &ns);
>>     a->tv_nsec = ns;
>> }
>
> ...and now the "meat" of this function isn't inline anymore.  If we
> cared about not doing a divide here, you'll have to explain why
> taking this trivial loop out of line is a good idea.

On x86-32 it compiles to 26 instructions and 47 bytes of code.  
Admittedly it might be smaller inline - or on a 64-bit machine - but I 
seriously doubt its suffering a huge performance hit from being out of 
line.  These days the inline threshold is very small - probably under 10 
instructions.  A direct call/return is essentially free, since it can be 
trivially prefetched.

>> +unsigned iter_div_u64_rem(u64 dividend, u32 divisor, u64 *remainder)
>> +{
>> +    unsigned ret = 0;
>> +
>> +    while(dividend >= divisor) {
>
> You removed the unlikely() here.  Why?

Because it didn't seem all that unlikely.  Besides, it makes not one bit 
of difference to the code generated by my compiler.

>> +        /* The following asm() prevents the compiler from
>> +           optimising this loop into a modulo operation.  */
>> +        asm("" : "+rm"(dividend));
>
> You changed "+r" to "+rm" here.  Why?  Also, "rm" is an x86-ism,
> and this is generic code (it does work here, but why is it better
> than "r"?)

"rm" isn't x86-specific.  I just wanted to give the compiler the freedom 
to put the value in either register or memory if it wanted to.

>> +EXPORT_SYMBOL(iter_div_u64_rem);
>
> Does this need to be exported?

Everything else in the file is exported.

> Can I suggest an alternative approach?  Define a macro (with a
> good, descriptive name!) for just the asm("" : "+r"(x)), and use
> that.  Much smaller patch, much clearer code, and doesn't result
> in different (and worse) code generation, so it's a much safer
> change.

Uh, could you suggest a name?  Something along the lines of 
prevent_gcc_from_strength_reducing_this_subtraction_loop_into_a_modulo_operation_thanks_oh_and_remember_to_use_it_in_all_the_right_places() 
springs to mind.

Rather than putting this unsightly (though with a smear of lipstick) 
hack into every open-coded iterative div-mod loop, we may as well 
implement it once and just look out for places which should be using it.

I don't think the "worse" code generation is much of an issue, since it 
isn't used anywhere performance critical, and moving the code out of 
line means there should be an overall reduction in code size.

    J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ