lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 16 Jun 2022 17:29:19 -0600
From:   Yu Zhao <yuzhao@...gle.com>
To:     Barry Song <21cnbao@...il.com>
Cc:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Will Deacon <will@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Linux-MM <linux-mm@...ck.org>, Andi Kleen <ak@...ux.intel.com>,
        Aneesh Kumar <aneesh.kumar@...ux.ibm.com>,
        Catalin Marinas <catalin.marinas@....com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Hillf Danton <hdanton@...a.com>, Jens Axboe <axboe@...nel.dk>,
        Johannes Weiner <hannes@...xchg.org>,
        Jonathan Corbet <corbet@....net>,
        Matthew Wilcox <willy@...radead.org>,
        Mel Gorman <mgorman@...e.de>,
        Michael Larabel <Michael@...haellarabel.com>,
        Michal Hocko <mhocko@...nel.org>,
        Mike Rapoport <rppt@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Tejun Heo <tj@...nel.org>, Vlastimil Babka <vbabka@...e.cz>,
        LAK <linux-arm-kernel@...ts.infradead.org>,
        Linux Doc Mailing List <linux-doc@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>, x86 <x86@...nel.org>,
        Kernel Page Reclaim v2 <page-reclaim@...gle.com>,
        Brian Geffon <bgeffon@...gle.com>,
        Jan Alexander Steffens <heftig@...hlinux.org>,
        Oleksandr Natalenko <oleksandr@...alenko.name>,
        Steven Barrett <steven@...uorix.net>,
        Suleiman Souhlal <suleiman@...gle.com>,
        Daniel Byrne <djbyrne@....edu>,
        Donald Carr <d@...os-reins.com>,
        Holger Hoffstätte <holger@...lied-asynchrony.com>,
        Konstantin Kharlamov <Hi-Angel@...dex.ru>,
        Shuang Zhai <szhai2@...rochester.edu>,
        Sofia Trinh <sofia.trinh@....works>,
        Vaibhav Jain <vaibhav@...ux.ibm.com>, huzhanyuan@...o.com
Subject: Re: [PATCH v11 07/14] mm: multi-gen LRU: exploit locality in rmap

On Thu, Jun 16, 2022 at 4:33 PM Barry Song <21cnbao@...il.com> wrote:
>
> On Fri, Jun 17, 2022 at 9:56 AM Yu Zhao <yuzhao@...gle.com> wrote:
> >
> > On Wed, Jun 8, 2022 at 4:46 PM Barry Song <21cnbao@...il.com> wrote:
> > >
> > > On Thu, Jun 9, 2022 at 3:52 AM Linus Torvalds
> > > <torvalds@...ux-foundation.org> wrote:
> > > >
> > > > On Tue, Jun 7, 2022 at 5:43 PM Barry Song <21cnbao@...il.com> wrote:
> > > > >
> > > > > Given we used to have a flush for clear pte young in LRU, right now we are
> > > > > moving to nop in almost all cases for the flush unless the address becomes
> > > > > young exactly after look_around and before ptep_clear_flush_young_notify.
> > > > > It means we are actually dropping flush. So the question is,  were we
> > > > > overcautious? we actually don't need the flush at all even without mglru?
> > > >
> > > > We stopped flushing the TLB on A bit clears on x86 back in 2014.
> > > >
> > > > See commit b13b1d2d8692 ("x86/mm: In the PTE swapout page reclaim case
> > > > clear the accessed bit instead of flushing the TLB").
> > >
> > > This is true for x86, RISC-V, powerpc and S390. but it is not true for
> > > most platforms.
> > >
> > > There was an attempt to do the same thing in arm64:
> > > https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1793830.html
> > > but arm64 still sent a nosync tlbi and depent on a deferred to dsb :
> > > https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1794484.html
> >
> > Barry, you've already answered your own question.
> >
> > Without commit 07509e10dcc7 arm64: pgtable: Fix pte_accessible():
> >    #define pte_accessible(mm, pte)        \
> >   -       (mm_tlb_flush_pending(mm) ? pte_present(pte) : pte_valid_young(pte))
> >   +       (mm_tlb_flush_pending(mm) ? pte_present(pte) : pte_valid(pte))
> >
> > You missed all TLB flushes for PTEs that have gone through
> > ptep_test_and_clear_young() on the reclaim path. But most of the time,
> > you got away with it, only occasional app crashes:
> > https://lore.kernel.org/r/CAGsJ_4w6JjuG4rn2P=d974wBOUtXUUnaZKnx+-G6a8_mSROa+Q@mail.gmail.com/
> >
> > Why?
>
> Yes. On the arm64 platform, ptep_test_and_clear_young() without flush
> can cause random
> App to crash.
> ptep_test_and_clear_young() + flush won't have this kind of crashes though.
> But after applying commit 07509e10dcc7 arm64: pgtable: Fix
> pte_accessible(), on arm64,
> ptep_test_and_clear_young() without flush won't cause App to crash.
>
> ptep_test_and_clear_young(), with flush, without commit 07509e10dcc7:   OK
> ptep_test_and_clear_young(), without flush, with commit 07509e10dcc7:   OK
> ptep_test_and_clear_young(), without flush, without commit 07509e10dcc7:   CRASH

I agree -- my question was rhetorical :)

I was trying to imply this logic:
1. We cleared the A-bit in PTEs with ptep_test_and_clear_young()
2. We missed TLB flush for those PTEs on the reclaim path, i.e., case
3 (case 1 & 2 guarantee flushes)
3. We saw crashes, but only occasionally

Assuming TLB cached those PTEs, we would have seen the crashes more
often, which contradicts our observation. So the conclusion is TLB
didn't cache them most of the time, meaning flushing TLB just for the
sake of the A-bit isn't necessary.

> do you think it is safe to totally remove the flush code even for
> the original
> LRU?

Affirmative, based on not only my words, but 3rd parties':
1. Your (indirect) observation
2. Alexander's benchmark:
https://lore.kernel.org/r/BYAPR12MB271295B398729E07F31082A7CFAA0@BYAPR12MB2712.namprd12.prod.outlook.com/
3. The fundamental hardware limitation in terms of the TLB scalability
(Fig. 1): https://www.usenix.org/legacy/events/osdi02/tech/full_papers/navarro/navarro.pdf

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ