[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+55aFzZwOm3352HXgMqvSpgmTms-fqLfiu1ZYFO5DhOoiqv7g@mail.gmail.com>
Date: Wed, 1 Jun 2016 18:18:55 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: George Spelvin <linux@...encehorizons.net>
Cc: Peter Zijlstra <peterz@...radead.org>,
"J. Bruce Fields" <bfields@...hat.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v3 06/10] fs/namei.c: Improve dcache hash function
On Mon, May 30, 2016 at 11:10 AM, George Spelvin
<linux@...encehorizons.net> wrote:
>
> I understand, but 64x64-bit multiply on 32-bit is pretty annoyingly
> expensive. In time, code size, and register pressure which bloats
> surrounding code.
Side note, the code seems to work fairly well, but I do worry a bit
about the three large multiplies in link_path_walk().
There's two in fold_hash(), and one comes from "find_zero()".
It turns out to work fairly well on at least modern big-core x86
CPU's, because the multiplier is fairly beefy: low latency (3-4 cycles
in the current ctop) and fully pipelined.
Even atom should be 5 cycles and a multiplication result every two
cycles for 64-bit results.
Maybe we don't care, because looking around the modern ARM and POWER
cores do similarly, but I just wanted to point out that that code does
seem to fairly heavily rely on "everybody has bug and pipelined hw
multipliers" for performance.
.. and it's probably true that transistors are cheap, and crypto and
other uses have made CPU designers spend the effort on good
multipliers. I just remember a time when you definitely couldn't rely
on fast multiplies.
Linus
Powered by blists - more mailing lists