phc-discussions - Re: [PHC] babbling about trends

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20140403195215.GA14787@bolet.org>
Date: Thu, 3 Apr 2014 21:52:15 +0200
From: Thomas Pornin <pornin@...et.org>
To: discussions@...sword-hashing.net
Subject: Re: [PHC] babbling about trends

On Thu, Apr 03, 2014 at 09:23:24PM +0200, Krisztián Pintér wrote:
> i'm not 100% well informed (=0% informed) about the differences
> between cache and RAM.

Roughly speaking, cache is (usually) SRAM, which main RAM is (usually)
DRAM.

SRAM consists in active circuits; each bit in SRAM is held in a circuit
which consists in 6 transistors, and has two "stable" configurations
which can be externally read from and written to.

DRAM is more like an array of capacitors, each of them containing a
single bit. DRAM must be refreshed periodically (refresh = "read and
then write back the value"), because the capacitors are very tiny and
tend to leak (they leak faster at higher temperatures). In older days,
the main CPU (or the memory controller) had to do the refresh, but
nowadays RAM chips tend to come with an "auto-refresh" circuit which
handles most of the job.

SRAM is faster (the circuit shifts from one stable configuration to
another in less time) but it draws power continuously, whereas DRAM
draws power only for refresh cycles. Moreover, DRAM is much denser:
typically, an SRAM block will use about 5 or 6 times the area of a DRAM
block which contains the same quantity of data.

Moreover, it is technologically hard to put DRAM and transistor circuits
(like SRAM or a plain CPU) in the same chip. In a smartphone you will
often see one big "chip" which contains the CPU and the phone RAM, but
that's mostly a big plastic box which contains several separate
sub-chips. IBM has a patent for putting DRAM blocks within a CPU; they
use it for the Cell CPU (the one which equips the PS3).

For L1 cache, you need something real fast for random accesses, so SRAM
integrated into the CPU. For L2 cache, this is a tie: L2 cache is read
by "lines" (blocks of, say, 32 or 64 bytes) to populate the L1 cache on
demand, and while DRAM latency is high, it can be optimized for bulk
transfers (20 ns to get the first byte, but then 8 or 16 bytes flow
every nanosecond). So some CPU vendors will use DRAM for L2 cache, while
others still use SRAM. For main RAM, DRAM rules because it is much less
expensive.

> but i dare to bold-guess that cache can grow to eventually eliminate
> the need for anything else.

The last 20 years of technology do not corroborate that idea. Amount of
SRAM available to a given CPU has remained mostly constant over the
years. Current trend appears to be the inclusion of more DRAM rather
than making it faster by converting it to SRAM.

> CPUs don't have to be smart, we have smart compilers instead.

That's the core idea for RISC CPU; but it did not really win in the end
(dominant architecture is a tie between ARM and x86, depending on how
you count). For parallelism, compilers which automatically detect
occasions for parallelism have been an active research area for quite
some years, and they are not good at it yet.

Merging the main CPU and a GPU-like architecture would make for a fun
system and I would quite like to see that; but I don't think it would
be a commercial success. Not immediately, at least: too much sequential
code to support... I mean, yeah, compilers are all right, but how
about running a PHP Web server in a GPU ? It won't be easy.

	--Thomas Pornin