[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20170111150940.25d951a121a62e1b7eff6f8d@linux-foundation.org>
Date: Wed, 11 Jan 2017 15:09:40 -0800
From: Andrew Morton <akpm@...ux-foundation.org>
To: Tim Chen <tim.c.chen@...ux.intel.com>
Cc: "Huang, Ying" <ying.huang@...el.com>, dave.hansen@...el.com,
ak@...ux.intel.com, aaron.lu@...el.com, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, Hugh Dickins <hughd@...gle.com>,
Shaohua Li <shli@...nel.org>, Minchan Kim <minchan@...nel.org>,
Rik van Riel <riel@...hat.com>,
Andrea Arcangeli <aarcange@...hat.com>,
"Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
Vladimir Davydov <vdavydov.dev@...il.com>,
Johannes Weiner <hannes@...xchg.org>,
Michal Hocko <mhocko@...nel.org>,
Hillf Danton <hillf.zj@...baba-inc.com>,
Christian Borntraeger <borntraeger@...ibm.com>,
Jonathan Corbet <corbet@....net>
Subject: Re: [PATCH v5 3/9] mm/swap: Split swap cache into 64MB trunks
On Wed, 11 Jan 2017 09:55:13 -0800 Tim Chen <tim.c.chen@...ux.intel.com> wrote:
> The patch is to improve the scalability of the swap out/in via using
> fine grained locks for the swap cache. In current kernel, one address
> space will be used for each swap device. And in the common
> configuration, the number of the swap device is very small (one is
> typical). This causes the heavy lock contention on the radix tree of
> the address space if multiple tasks swap out/in concurrently. But in
> fact, there is no dependency between pages in the swap cache. So that,
> we can split the one shared address space for each swap device into
> several address spaces to reduce the lock contention. In the patch, the
> shared address space is split into 64MB trunks. 64MB is chosen to
> balance the memory space usage and effect of lock contention reduction.
>
> The size of struct address_space on x86_64 architecture is 408B, so with
> the patch, 6528B more memory will be used for every 1GB swap space on
> x86_64 architecture.
>
> One address space is still shared for the swap entries in the same 64M
> trunks. To avoid lock contention for the first round of swap space
> allocation, the order of the swap clusters in the initial free clusters
> list is changed. The swap space distance between the consecutive swap
> clusters in the free cluster list is at least 64M. After the first
> round of allocation, the swap clusters are expected to be freed
> randomly, so the lock contention should be reduced effectively.
Switching from a single radix-tree to an array of radix-trees to reduce
contention seems a bit hacky. That we can do this and have everything
continue to work tells me that we're simply using an inappropriate data
structure to hold this info.
Powered by blists - more mailing lists