lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110809112114.3943.qmail@science.horizon.com>
Date:	9 Aug 2011 07:21:14 -0400
From:	"George Spelvin" <linux@...izon.com>
To:	akpm@...ux-foundation.org, fzago@...temfabricworks.com,
	joakim.tjernlund@...nsmode.se, linux-kernel@...r.kernel.org,
	linux@...izon.com, rpearson@...temfabricworks.com
Subject: Re: [patch v3 6/7] crc32: add-slicing-by-8.diff

While writing up some documentation of this algorithm, I came up with
a potential speedup.  Or, at least, realized why slicing by more than
4 is so much faster than slicing by 4 or less.

Note that the inner loop of the algorithm is as follows:

+#  define DO_CRC8a (tab[7][(q) & 255] ^ \
+		tab[6][(q >> 8) & 255] ^ \
+		tab[5][(q >> 16) & 255] ^ \
+		tab[4][(q >> 24) & 255])
+#  define DO_CRC8b (tab[3][(q) & 255] ^ \
+		tab[2][(q >> 8) & 255] ^ \
+		tab[1][(q >> 16) & 255] ^ \
+		tab[0][(q >> 24) & 255])

+	for (--b; middle_len; --middle_len) {
+		u32 q;
+		q = crc ^ *++b;
+		crc = DO_CRC8a;
+		q = *++b;
+		crc ^= DO_CRC8b;
 	}

Note the data dependencies: DO_CRC8a depends on the
previous crc, which depends on the previous DO_CRC8b.
But DO_CRC8b does not depend on anything except the
input data at *++b.

It would increase parallelism to schedule DO_CRC8b before DO_CRC8a,
to start those loads before the previous crc value is available.

Maybe the compiler and/pr processor can find this parallelism already,
but if not, it might be useful to try reordering it:

#  define DO_CRC8a(x) (tab[7][(x) & 255] ^ \
		tab[6][((x) >> 8) & 255] ^ \
		tab[5][((x) >> 16) & 255] ^ \
		tab[4][((x) >> 24) & 255])
#  define DO_CRC8b(x) (tab[3][(x) & 255] ^ \
		tab[2][((x) >> 8) & 255] ^ \
		tab[1][((x) >> 16) & 255] ^ \
		tab[0][((x) >> 24) & 255])

	for ( ; middle_len; --middle_len, b += 2) {
		u32 q = DO_CRC8b(b[1]);
		crc ^= b[0];
		crc = q ^ DO_CRC8a(crc);
	}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ