[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <40c02402-ab76-6bd2-5e7d-77fea82e55fe@oracle.com>
Date: Tue, 13 Feb 2018 16:07:19 -0500
From: Daniel Jordan <daniel.m.jordan@...cle.com>
To: Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Cc: aaron.lu@...el.com, ak@...ux.intel.com, Dave.Dice@...cle.com,
dave@...olabs.net, khandual@...ux.vnet.ibm.com,
ldufour@...ux.vnet.ibm.com, mgorman@...e.de, mhocko@...nel.org,
pasha.tatashin@...cle.com, steven.sistare@...cle.com,
yossi.lev@...cle.com
Subject: Re: [RFC PATCH v1 00/13] lru_lock scalability
On 02/08/2018 06:36 PM, Andrew Morton wrote:
> On Wed, 31 Jan 2018 18:04:00 -0500 daniel.m.jordan@...cle.com wrote:
>
>> lru_lock, a per-node* spinlock that protects an LRU list, is one of the
>> hottest locks in the kernel. On some workloads on large machines, it
>> shows up at the top of lock_stat.
>
> Do you have details on which callsites are causing the problem? That
> would permit us to consider other approaches, perhaps.
Sure, there are two paths where we're seeing contention.
In the first one, a pagevec's worth of anonymous pages are added to
various LRUs when the per-cpu pagevec fills up:
/* take an anonymous page fault, eventually end up at... */
handle_pte_fault
do_anonymous_page
lru_cache_add_active_or_unevictable
lru_cache_add
__lru_cache_add
__pagevec_lru_add
pagevec_lru_move_fn
/* contend on lru_lock */
In the second, one or more pages are removed from an LRU under one hold
of lru_lock:
// userland calls munmap or exit, eventually end up at...
zap_pte_range
__tlb_remove_page // returns true because we eventually hit
// MAX_GATHER_BATCH_COUNT in tlb_next_batch
tlb_flush_mmu_free
free_pages_and_swap_cache
release_pages
/* contend on lru_lock */
For a broader context, we've run decision support benchmarks where
lru_lock (and zone->lock) show long wait times. But we're not the only
ones according to certain kernel comments:
mm/vmscan.c:
* zone_lru_lock is heavily contended. Some of the functions that
* shrink the lists perform better by taking out a batch of pages
* and working on them outside the LRU lock.
*
* For pagecache intensive workloads, this function is the hottest
* spot in the kernel (apart from copy_*_user functions).
...
static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
include/linux/mmzone.h:
* zone->lock and the [pgdat->lru_lock] are two of the hottest locks in
the kernel.
* So add a wild amount of padding here to ensure that they fall into
separate
* cachelines. ...
Anyway, if you're seeing this lock in your workloads, I'm interested in
hearing what you're running so we can get more real world data on this.
Powered by blists - more mailing lists