lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 3 Aug 2022 23:42:39 -0700
From:   Nadav Amit <nadav.amit@...il.com>
To:     Peter Xu <peterx@...hat.com>
Cc:     LKML <linux-kernel@...r.kernel.org>, Linux MM <linux-mm@...ck.org>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Andi Kleen <andi.kleen@...el.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Hugh Dickins <hughd@...gle.com>,
        Huang Ying <ying.huang@...el.com>,
        "Kirill A . Shutemov" <kirill@...temov.name>,
        Vlastimil Babka <vbabka@...e.cz>,
        David Hildenbrand <david@...hat.com>
Subject: Re: [PATCH 2/2] mm: Remember young bit for page migrations

On Aug 3, 2022, at 9:45 AM, Peter Xu <peterx@...hat.com> wrote:

> On Wed, Aug 03, 2022 at 12:42:54AM -0700, Nadav Amit wrote:
>> On Aug 2, 2022, at 6:21 PM, Peter Xu <peterx@...hat.com> wrote:
>> 
>> On the negative side, I am not sure whether other archs, that might require
>> a TLB flush for resetting the access-bit, and the overhead of doing atomic
>> operation to clear the access-bit, would not induce more overhead than they
>> would save.
> 
> I think your proposal makes sense and looks clean, maybe even cleaner than
> the new max_swapfile_size() approach (and definitely nicer than the old one
> of mine). It's just that I still want this to happen even without page idle
> enabled - at least Fedora doesn't have page idle enabled by default.  I'm
> not sure whether it'll be worth it to define Young bit just for this (note:
> iiuc we don't need Idle bit in this case, but only the Young bit).
> 
> The other thing is whether there's other side effect of losing pte level
> granularity of young bit, since right after we merge them into the page
> flags, then that granule is lost.  So far I don't worry a lot on the tlb
> flush overhead, but hopefully nothing else we missed.

I agree with your arguments. I missed the fact that page young bit is only
defined if PAGE_IDLE is defined. So unless it makes sense in general to have
all pages marked as accessed if the page is young, adding the bit would
cause additional overheads, especially for 32-bit systems.

I also agree that the solution that you provided would improve page-idle
behavior. However, while not being wrong, users who try to clear page-idle
indications would not succeed doing so for pages that are undergoing a
migration.

There are some additional implications that I do not remember that you or
anyone else (including me) mentioned, specifically for MADV_COLD and
MADV_FREE. You may want to teach madvise_cold_or_pageout_pte_range() and
madvise_free_pte_range() to deal with these new type of entries.

On the other hand, madvise is already “broken” today in that regard, since
IIUC, it does not even clear PageReferenced (on MADV_COLD) for migrated
pages.

>> One more unrelated point - note that remove_migration_pte() would always set
>> a clean PTE even when the old one was dirty…
> 
> Correct.  Say it in another way, at least initial writes perf will still
> suffer after migration on x86.
> 
> Dirty bit is kind of different in this case so I didn't yet try to cover
> it.  E.g., we won't lose it even without this patchset but consolidates it
> into PageDirty already or it'll be a bug.
> 
> I think PageDirty could be cleared during migration procedure, if so we
> could be wrongly applying the dirty bit to the recovered pte.  I thought
> about this before posting this series, but I hesitated on adding dirty bit
> altogether with it at least in these initial versions since dirty bit may
> need some more justifications.
> 
> Please feel free to share any further thoughts on the dirty bit.

I fully understand that the dirty-bit can stay for a different patch(-set).
But I do not see a problem in setting the dirty-bit if the PTE is mapped as
writable, since anyhow the page can be dirties at any given moment
afterwards without the kernel involvement.

If you are concerned that the page will be written back in between and the
PTE would be marked as dirty unnecessarily, you can keep the bit only if the
both PageDirty and a new "swap entry dirty-bit” are set.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ