linux-kernel - Re: [PATCH v7 04/12] mm: multigenerational LRU: groundwork

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAOUHufYaa+1S_sDmFSPJTkVym75EpFy1yPtXdobTjdGDzmk3Kg@mail.gmail.com>
Date:   Wed, 16 Mar 2022 15:37:06 -0600
From:   Yu Zhao <yuzhao@...gle.com>
To:     Barry Song <21cnbao@...il.com>
Cc:     Konstantin Kharlamov <Hi-Angel@...dex.ru>,
        Michael Larabel <Michael@...haellarabel.com>,
        Andi Kleen <ak@...ux.intel.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        "Aneesh Kumar K . V" <aneesh.kumar@...ux.ibm.com>,
        Jens Axboe <axboe@...nel.dk>,
        Brian Geffon <bgeffon@...gle.com>,
        Catalin Marinas <catalin.marinas@....com>,
        Jonathan Corbet <corbet@....net>,
        Donald Carr <d@...os-reins.com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Daniel Byrne <djbyrne@....edu>,
        Johannes Weiner <hannes@...xchg.org>,
        Hillf Danton <hdanton@...a.com>,
        Jan Alexander Steffens <heftig@...hlinux.org>,
        Holger Hoffstätte <holger@...lied-asynchrony.com>,
        Jesse Barnes <jsbarnes@...gle.com>,
        Linux ARM <linux-arm-kernel@...ts.infradead.org>,
        "open list:DOCUMENTATION" <linux-doc@...r.kernel.org>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        Linux-MM <linux-mm@...ck.org>, Mel Gorman <mgorman@...e.de>,
        Michal Hocko <mhocko@...nel.org>,
        Oleksandr Natalenko <oleksandr@...alenko.name>,
        Kernel Page Reclaim v2 <page-reclaim@...gle.com>,
        Rik van Riel <riel@...riel.com>,
        Mike Rapoport <rppt@...nel.org>,
        Sofia Trinh <sofia.trinh@....works>,
        Steven Barrett <steven@...uorix.net>,
        Suleiman Souhlal <suleiman@...gle.com>,
        Shuang Zhai <szhai2@...rochester.edu>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Vlastimil Babka <vbabka@...e.cz>,
        Will Deacon <will@...nel.org>,
        Matthew Wilcox <willy@...radead.org>,
        "the arch/x86 maintainers" <x86@...nel.org>,
        Huang Ying <ying.huang@...el.com>
Subject: Re: [PATCH v7 04/12] mm: multigenerational LRU: groundwork

On Wed, Mar 16, 2022 at 12:06 AM Barry Song <21cnbao@...il.com> wrote:

< snipped>
> > The cost is not the point; the fairness is:
> >
> > 1) Ramdisk is fair to both LRU algorithms.
> > 2) Zram punishes the LRU algorithm that chooses incompressible pages.
> > IOW, this algorithm needs to compress more pages in order to save the
> > same amount of memory.
>
> I see your point. but my point is that with higher I/O cost to swap
> in and swap out pages,  more major faults(lower hit ratio) will
> contribute to the loss of final performance.
>
> So for the particular case, if we move to a real disk as a swap
> device, we might see the same result as zRAM I was using
> since you also reported more page faults.

If we wanted to talk about I/O cost, we would need to consider the
number of writes and writeback patterns as well. The LRU algorithm
that *unconsciously* picks more clean pages has an advantage because
writes are usually slower than reads. Similarly, the LRU algorithm
that *unconsciously* picks a cluster of cold pages that later would be
faulted in together also has the advantage because sequential reads
are faster than random reads. Do we want to go into this rabbit hole?
I think not. That's exactly why I suggested we focus on the fairness.
But, just outta curiosity, MGLRU was faster when swapping to a slow
MMC disk.

# mmc cid read /sys/class/mmc_host/mmc1/mmc1:0001
type: 'MMC'
manufacturer: 'SanDisk-Toshiba Corporation' ''
product: 'DA4064' 1.24400152
serial: 0x00000000
manfacturing date: 2006 aug

# baseline + THP=never
0 records/s
real 872.00 s
user 51.69 s
sys  483.09 s

    13.07%  __memcpy_neon
    11.37%  __pi_clear_page
     9.35%  _raw_spin_unlock_irq
     5.52%  mod_delayed_work_on
     5.17%  _raw_spin_unlock_irqrestore
     3.95%  do_raw_spin_lock
     3.87%  rmqueue_pcplist
     3.60%  local_daif_restore
     3.17%  free_unref_page_list
     2.74%  zap_pte_range
     2.00%  handle_mm_fault
     1.19%  do_anonymous_page

# MGLRU + THP=never
0 records/s
real 821.00 s
user 44.45 s
sys  428.21 s

    13.28%  __memcpy_neon
    12.78%  __pi_clear_page
     9.14%  _raw_spin_unlock_irq
     5.95%  _raw_spin_unlock_irqrestore
     5.08%  mod_delayed_work_on
     4.45%  do_raw_spin_lock
     3.86%  local_daif_restore
     3.81%  rmqueue_pcplist
     3.32%  free_unref_page_list
     2.89%  zap_pte_range
     1.89%  handle_mm_fault
     1.10%  do_anonymous_page

# baseline + THP=madvise
0 records/s
real 1341.00 s
user 68.15 s
sys  681.42 s

    12.33%  __memcpy_neon
    11.78%  _raw_spin_unlock_irq
     8.79%  __pi_clear_page
     7.63%  mod_delayed_work_on
     5.49%  _raw_spin_unlock_irqrestore
     3.23%  local_daif_restore
     3.00%  do_raw_spin_lock
     2.83%  rmqueue_pcplist
     2.21%  handle_mm_fault
     2.00%  zap_pte_range
     1.51%  free_unref_page_list
     1.33%  do_swap_page
     1.17%  do_anonymous_page

# MGLRU + THP=madvise
0 records/s
real 1315.00 s
user 60.59 s
sys  620.56 s

    12.34%  __memcpy_neon
    12.17%  _raw_spin_unlock_irq
     9.33%  __pi_clear_page
     7.33%  mod_delayed_work_on
     6.01%  _raw_spin_unlock_irqrestore
     3.27%  local_daif_restore
     3.23%  do_raw_spin_lock
     2.98%  rmqueue_pcplist
     2.12%  handle_mm_fault
     2.04%  zap_pte_range
     1.65%  free_unref_page_list
     1.27%  do_swap_page
     1.11%  do_anonymous_page