lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20231011124900.sp22hoxoitrslbia@techsingularity.net>
Date:   Wed, 11 Oct 2023 13:49:00 +0100
From:   Mel Gorman <mgorman@...hsingularity.net>
To:     Huang Ying <ying.huang@...el.com>
Cc:     linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        Arjan Van De Ven <arjan@...ux.intel.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Vlastimil Babka <vbabka@...e.cz>,
        David Hildenbrand <david@...hat.com>,
        Johannes Weiner <jweiner@...hat.com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Michal Hocko <mhocko@...e.com>,
        Pavel Tatashin <pasha.tatashin@...een.com>,
        Matthew Wilcox <willy@...radead.org>,
        Christoph Lameter <cl@...ux.com>
Subject: Re: [PATCH 03/10] mm, pcp: reduce lock contention for draining
 high-order pages

On Wed, Sep 20, 2023 at 02:18:49PM +0800, Huang Ying wrote:
> In commit f26b3fa04611 ("mm/page_alloc: limit number of high-order
> pages on PCP during bulk free"), the PCP (Per-CPU Pageset) will be
> drained when PCP is mostly used for high-order pages freeing to
> improve the cache-hot pages reusing between page allocating and
> freeing CPUs.
> 
> On system with small per-CPU data cache, pages shouldn't be cached
> before draining to guarantee cache-hot.  But on a system with large
> per-CPU data cache, more pages can be cached before draining to reduce
> zone lock contention.
> 
> So, in this patch, instead of draining without any caching, "batch"
> pages will be cached in PCP before draining if the per-CPU data cache
> size is more than "4 * batch".
> 
> On a 2-socket Intel server with 128 logical CPU, with the patch, the
> network bandwidth of the UNIX (AF_UNIX) test case of lmbench test
> suite with 16-pair processes increase 72.2%.  The cycles% of the
> spinlock contention (mostly for zone lock) decreases from 45.8% to
> 21.2%.  The number of PCP draining for high order pages
> freeing (free_high) decreases 89.8%.  The cache miss rate keeps 0.3%.
> 
> Signed-off-by: "Huang, Ying" <ying.huang@...el.com>

Acked-by: Mel Gorman <mgorman@...hsingularity.net>

However, the flag should also have been documented to make it clear that
it preserves some pages on the PCP if the cache is large enough. Similar
to the previous patch, it would have been easier to reason about in the
general case if the decision had only been based on the LLC without
having to worry if any intermediate layer has a meaningful impact that
varies across CPU implementations.

-- 
Mel Gorman
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ