lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1f1d88a6a33f4e5db99544fda965c594@AcuMS.aculab.com>
Date:   Mon, 15 May 2023 10:29:18 +0000
From:   David Laight <David.Laight@...LAB.COM>
To:     'Kent Overstreet' <kent.overstreet@...ux.dev>,
        Eric Biggers <ebiggers@...nel.org>
CC:     Lorenzo Stoakes <lstoakes@...il.com>,
        Christoph Hellwig <hch@...radead.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
        "linux-bcachefs@...r.kernel.org" <linux-bcachefs@...r.kernel.org>,
        Kent Overstreet <kent.overstreet@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Uladzislau Rezki <urezki@...il.com>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>
Subject: RE: [PATCH 07/32] mm: Bring back vmalloc_exec

From: Kent Overstreet
> Sent: 14 May 2023 06:45
...
> dynamically generated unpack:
> rand_insert: 20.0 MiB with 1 threads in    33 sec,  1609 nsec per iter, 607 KiB per sec
> 
> old C unpack:
> rand_insert: 20.0 MiB with 1 threads in    35 sec,  1672 nsec per iter, 584 KiB per sec
> 
> the Eric Biggers special:
> rand_insert: 20.0 MiB with 1 threads in    35 sec,  1676 nsec per iter, 583 KiB per sec
> 
> Tested two versions of your approach, one without a shift value, one
> where we use a shift value to try to avoid unaligned access - second was
> perhaps 1% faster

You won't notice any effect of avoiding unaligned accesses on x86.
I think then get split into 64bit accesses and again on 64 byte
boundaries (that is what I see for uncached access to PCIe).
The kernel won't be doing >64bit and the 'out of order'
pipeline will tend to cover the others (especially since you
get 2 reads/clock).

> so it's not looking good. This benchmark doesn't even hit on
> unpack_key() quite as much as I thought, so the difference is
> significant.

Beware: unless you manage to lock the cpu frequency (which is ~impossible
on some cpu) timings in nanoseconds are pretty useless.
You can use the performance counter to get accurate cycle times
(provided there isn't a cpu switch in the middle of a micro-benchmark).

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ