linux-kernel - Re: [PATCH v5 3/9] mm/swap: Split swap cache into 64MB trunks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20170111150940.25d951a121a62e1b7eff6f8d@linux-foundation.org>
Date:   Wed, 11 Jan 2017 15:09:40 -0800
From:   Andrew Morton <akpm@...ux-foundation.org>
To:     Tim Chen <tim.c.chen@...ux.intel.com>
Cc:     "Huang, Ying" <ying.huang@...el.com>, dave.hansen@...el.com,
        ak@...ux.intel.com, aaron.lu@...el.com, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, Hugh Dickins <hughd@...gle.com>,
        Shaohua Li <shli@...nel.org>, Minchan Kim <minchan@...nel.org>,
        Rik van Riel <riel@...hat.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        "Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
        Vladimir Davydov <vdavydov.dev@...il.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Michal Hocko <mhocko@...nel.org>,
        Hillf Danton <hillf.zj@...baba-inc.com>,
        Christian Borntraeger <borntraeger@...ibm.com>,
        Jonathan Corbet <corbet@....net>
Subject: Re: [PATCH v5 3/9] mm/swap: Split swap cache into 64MB trunks

On Wed, 11 Jan 2017 09:55:13 -0800 Tim Chen <tim.c.chen@...ux.intel.com> wrote:

> The patch is to improve the scalability of the swap out/in via using
> fine grained locks for the swap cache.  In current kernel, one address
> space will be used for each swap device.  And in the common
> configuration, the number of the swap device is very small (one is
> typical).  This causes the heavy lock contention on the radix tree of
> the address space if multiple tasks swap out/in concurrently.  But in
> fact, there is no dependency between pages in the swap cache.  So that,
> we can split the one shared address space for each swap device into
> several address spaces to reduce the lock contention.  In the patch, the
> shared address space is split into 64MB trunks.  64MB is chosen to
> balance the memory space usage and effect of lock contention reduction.
> 
> The size of struct address_space on x86_64 architecture is 408B, so with
> the patch, 6528B more memory will be used for every 1GB swap space on
> x86_64 architecture.
> 
> One address space is still shared for the swap entries in the same 64M
> trunks.  To avoid lock contention for the first round of swap space
> allocation, the order of the swap clusters in the initial free clusters
> list is changed.  The swap space distance between the consecutive swap
> clusters in the free cluster list is at least 64M.  After the first
> round of allocation, the swap clusters are expected to be freed
> randomly, so the lock contention should be reduced effectively.

Switching from a single radix-tree to an array of radix-trees to reduce
contention seems a bit hacky.  That we can do this and have everything
continue to work tells me that we're simply using an inappropriate data
structure to hold this info.