[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1415979978.15154.41.camel@localhost>
Date: Fri, 14 Nov 2014 16:46:18 +0100
From: Hannes Frederic Sowa <hannes@...essinduktion.org>
To: Eric Dumazet <eric.dumazet@...il.com>
Cc: netdev@...r.kernel.org, ogerlitz@...lanox.com, pshelar@...ira.com,
jesse@...ira.com, jay.vosburgh@...onical.com,
discuss@...nvswitch.org
Subject: Re: [PATCH net-next] fast_hash: clobber registers correctly for
inline function use
On Fr, 2014-11-14 at 07:33 -0800, Eric Dumazet wrote:
> On Fri, 2014-11-14 at 16:13 +0100, Hannes Frederic Sowa wrote:
> > >
> > >
> > > Thats a lot of clobbers.
> >
> > Yes, those are basically all callee-clobbered registers for the
> > particular architecture. I didn't look at the generated code for jhash
> > and crc_hash because I want this code to always be safe, independent of
> > the version and optimization levels of gcc.
> >
> > > Alternative would be to use an assembly trampoline to save/restore them
> > > before calling __jhash2
> >
> > This version provides the best hints on how to allocate registers to the
> > optimizers. E.g. it could avoid using callee-clobbered registers but use
> > callee-saved ones. If we build a trampoline, we need to save and reload
> > all registers all the time. This version just lets gcc decide how to do
> > that.
> >
> > > __intel_crc4_2_hash2 can probably be written in assembly, it is quite
> > > simple.
> >
> > Sure, but all the pre and postconditions must hold for both, jhash and
> > intel_crc4_2_hash and I don't want to rewrite jhash in assembler.
>
> We write optimized code for current cpus.
>
> With current generation of cpus, we have crc32 support.
__intel_crc4_2_hash(2) does already make use of crc32 instruction. I'll
have a closer look at what gcc generates.
> The fallback having to save/restore few registers, we don't care, as the
> fallback has huge cost anyway.
>
> You don't have to write jhash() in assembler, you misunderstood me.
Ok, understood, so we only clobber the registers needed in the
crc32_hash implementation and only if we branch to jhash we save all the
other ones in a trampoline directly before jhash.
> We only have to provide a trampoline in assembler, with maybe 10
> instructions.
>
> Then gcc will know that we do not clobber registers for the optimized
> case.
Yes, makes sense.
I would still like to see the current proposed fix getting applied and
we can do this on-top. The inline call after this patch reassembles a
direct function call, so besides the long list of clobbers, it should
still be pretty fast.
Thanks,
Hannes
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists