linux-kernel - Re: [PATCH] common implementation of iterative div/mod

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4823774E.2040207@goop.org>
Date:	Thu, 08 May 2008 22:57:34 +0100
From:	Jeremy Fitzhardinge <jeremy@...p.org>
To:	Segher Boessenkool <segher@...nel.crashing.org>
CC:	LKML <linux-kernel@...r.kernel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Ingo Molnar <mingo@...e.hu>,
	Christian Kujau <lists@...dbynature.de>,
	Robert Hancock <hancockr@...w.ca>,
	john stultz <johnstul@...ibm.com>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH] common implementation of iterative div/mod

Segher Boessenkool wrote:
>> The workaround is to put a dummy asm statement in the loop to prevent
>> gcc from performing the transformation.
>
> It's not a "dummy" asm, it actually does something: it tells the
> compiler that it has to iterate the loop exactly as written, and
> not do something else.  I.e., exactly the behaviour we want here.

No, it has no function of its own.  It's bullying gcc into not 
performing an optimisation by giving the impression its doing something.

>> +    ticks = iter_div_u64_rem(blocked, NS_PER_TICK, &blocked);
>
> What a terrible function name.

It's consistent with the other functions defined here.  I agree it isn't 
pretty.  If you have a better suggestion, I'm all ears.

>> static inline void timespec_add_ns(struct timespec *a, u64 ns)
>> {
>> -    ns += a->tv_nsec;
>> -    while(unlikely(ns >= NSEC_PER_SEC)) {
>> -        /* The following asm() prevents the compiler from
>> -         * optimising this loop into a modulo operation.  */
>> -        asm("" : "+r"(ns));
>> -
>> -        ns -= NSEC_PER_SEC;
>> -        a->tv_sec++;
>> -    }
>> +    a->tv_sec += iter_div_u64_rem(a->tv_nsec + ns, NSEC_PER_SEC, &ns);
>>     a->tv_nsec = ns;
>> }
>
> ...and now the "meat" of this function isn't inline anymore.  If we
> cared about not doing a divide here, you'll have to explain why
> taking this trivial loop out of line is a good idea.

On x86-32 it compiles to 26 instructions and 47 bytes of code.  
Admittedly it might be smaller inline - or on a 64-bit machine - but I 
seriously doubt its suffering a huge performance hit from being out of 
line.  These days the inline threshold is very small - probably under 10 
instructions.  A direct call/return is essentially free, since it can be 
trivially prefetched.

>> +unsigned iter_div_u64_rem(u64 dividend, u32 divisor, u64 *remainder)
>> +{
>> +    unsigned ret = 0;
>> +
>> +    while(dividend >= divisor) {
>
> You removed the unlikely() here.  Why?

Because it didn't seem all that unlikely.  Besides, it makes not one bit 
of difference to the code generated by my compiler.

>> +        /* The following asm() prevents the compiler from
>> +           optimising this loop into a modulo operation.  */
>> +        asm("" : "+rm"(dividend));
>
> You changed "+r" to "+rm" here.  Why?  Also, "rm" is an x86-ism,
> and this is generic code (it does work here, but why is it better
> than "r"?)

"rm" isn't x86-specific.  I just wanted to give the compiler the freedom 
to put the value in either register or memory if it wanted to.

>> +EXPORT_SYMBOL(iter_div_u64_rem);
>
> Does this need to be exported?

Everything else in the file is exported.

> Can I suggest an alternative approach?  Define a macro (with a
> good, descriptive name!) for just the asm("" : "+r"(x)), and use
> that.  Much smaller patch, much clearer code, and doesn't result
> in different (and worse) code generation, so it's a much safer
> change.

Uh, could you suggest a name?  Something along the lines of 
prevent_gcc_from_strength_reducing_this_subtraction_loop_into_a_modulo_operation_thanks_oh_and_remember_to_use_it_in_all_the_right_places() 
springs to mind.

Rather than putting this unsightly (though with a smear of lipstick) 
hack into every open-coded iterative div-mod loop, we may as well 
implement it once and just look out for places which should be using it.

I don't think the "worse" code generation is much of an issue, since it 
isn't used anywhere performance critical, and moving the code out of 
line means there should be an overall reduction in code size.

    J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/