lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 9 Mar 2020 08:38:31 -0700
From:   Andi Kleen <ak@...ux.intel.com>
To:     Michal Hocko <mhocko@...nel.org>
Cc:     "Kirill A. Shutemov" <kirill@...temov.name>,
        Cannon Matthews <cannonmatthews@...gle.com>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Matthew Wilcox <willy@...radead.org>,
        David Rientjes <rientjes@...gle.com>,
        Greg Thelen <gthelen@...gle.com>,
        Salman Qazi <sqazi@...gle.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, x86@...nel.org
Subject: Re: [PATCH] mm: clear 1G pages with streaming stores on x86


> Gigantic huge pages are a bit different. They are much less dynamic from
> the usage POV in my experience. Micro-optimizations for the first access
> tends to not matter at all as it is usually pre-allocation scenario. On
> the other hand, speeding up the initialization sounds like a good thing
> in general. It will be a single time benefit but if the additional code
> is not hard to maintain then I would be inclined to take it even with
> "artificial" numbers state above. There really shouldn't be other downsides
> except for the code maintenance, right?

There's a cautious tale of the old crappy RAID5 XOR assembler functions which
were optimized a long time ago for the Pentium1, and stayed around,
even though the compiler could actually do a better job.

String instructions are constantly improving in performance (Broadwell is
very old at this point) Most likely over time (and maybe even today
on newer CPUs) you would need much more sophisticated unrolled MOVNTI variants
(or maybe even AVX-*) to be competitive.

The best clear functions may also be different for different CPU
generations.

Using the string instructions has the advantage that all of this is abstracted
from the kernel.

-Andi

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ