linux-kernel - Re: [RFC PATCH v1 00/13] lru

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <40c02402-ab76-6bd2-5e7d-77fea82e55fe@oracle.com>
Date:   Tue, 13 Feb 2018 16:07:19 -0500
From:   Daniel Jordan <daniel.m.jordan@...cle.com>
To:     Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Cc:     aaron.lu@...el.com, ak@...ux.intel.com, Dave.Dice@...cle.com,
        dave@...olabs.net, khandual@...ux.vnet.ibm.com,
        ldufour@...ux.vnet.ibm.com, mgorman@...e.de, mhocko@...nel.org,
        pasha.tatashin@...cle.com, steven.sistare@...cle.com,
        yossi.lev@...cle.com
Subject: Re: [RFC PATCH v1 00/13] lru_lock scalability

On 02/08/2018 06:36 PM, Andrew Morton wrote:
> On Wed, 31 Jan 2018 18:04:00 -0500 daniel.m.jordan@...cle.com wrote:
> 
>> lru_lock, a per-node* spinlock that protects an LRU list, is one of the
>> hottest locks in the kernel.  On some workloads on large machines, it
>> shows up at the top of lock_stat.
> 
> Do you have details on which callsites are causing the problem?  That
> would permit us to consider other approaches, perhaps.

Sure, there are two paths where we're seeing contention.

In the first one, a pagevec's worth of anonymous pages are added to 
various LRUs when the per-cpu pagevec fills up:

   /* take an anonymous page fault, eventually end up at... */
   handle_pte_fault
     do_anonymous_page
       lru_cache_add_active_or_unevictable
         lru_cache_add
           __lru_cache_add
             __pagevec_lru_add
               pagevec_lru_move_fn
                 /* contend on lru_lock */

In the second, one or more pages are removed from an LRU under one hold 
of lru_lock:

   // userland calls munmap or exit, eventually end up at...
   zap_pte_range
     __tlb_remove_page // returns true because we eventually hit
                       // MAX_GATHER_BATCH_COUNT in tlb_next_batch
     tlb_flush_mmu_free
       free_pages_and_swap_cache
         release_pages
           /* contend on lru_lock */

For a broader context, we've run decision support benchmarks where 
lru_lock (and zone->lock) show long wait times. But we're not the only 
ones according to certain kernel comments:

mm/vmscan.c:
  * zone_lru_lock is heavily contended.  Some of the functions that
  * shrink the lists perform better by taking out a batch of pages
  * and working on them outside the LRU lock.
  *
  * For pagecache intensive workloads, this function is the hottest
  * spot in the kernel (apart from copy_*_user functions).
...
static unsigned long isolate_lru_pages(unsigned long nr_to_scan,

include/linux/mmzone.h:
  * zone->lock and the [pgdat->lru_lock] are two of the hottest locks in 
the kernel.
  * So add a wild amount of padding here to ensure that they fall into 
separate
  * cachelines. ...

Anyway, if you're seeing this lock in your workloads, I'm interested in 
hearing what you're running so we can get more real world data on this.