[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YVI195OZ7t3i3n6t@t490s>
Date: Mon, 27 Sep 2021 17:21:59 -0400
From: Peter Xu <peterx@...hat.com>
To: Hugh Dickins <hughd@...gle.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org,
Andrea Arcangeli <aarcange@...hat.com>,
Liam Howlett <liam.howlett@...cle.com>,
Mike Rapoport <rppt@...ux.vnet.ibm.com>,
Yang Shi <shy828301@...il.com>,
David Hildenbrand <david@...hat.com>,
"Kirill A . Shutemov" <kirill@...temov.name>,
Jerome Glisse <jglisse@...hat.com>,
Alistair Popple <apopple@...dia.com>,
Miaohe Lin <linmiaohe@...wei.com>,
Matthew Wilcox <willy@...radead.org>,
Axel Rasmussen <axelrasmussen@...gle.com>
Subject: Re: [PATCH v4 1/4] mm/shmem: Unconditionally set pte dirty in
mfill_atomic_install_pte
Hi, Hugh,
On Thu, Sep 23, 2021 at 08:56:33PM -0700, Hugh Dickins wrote:
> I'm not going to NAK this, but you and I have different ideas of
> "very nice cleanups". Generally, you appear (understandably) to be
> trying to offload pieces of work from your larger series, but often
> I don't see the sense of them, here in isolation anyway.
>
> Is this a safe transformation of the existing code? Yes, I believe so
> (at least until someone adds some PTESAN checker which looks to see
> if any ptes are dirty in vmas to which user never had write access).
> But it took quite a lot of lawyering to arrive at that conclusion.
I can get your point there, but I keep a skeptical view if there'll be a tool
called PTESAN that asserts VM_WRITE for pte_dirty.
After we've noticed the arm64 implementation of pte_mkdirty() last time, I've
already started to not bind the ideas on VM_WRITE or pte_write() for pte dirty.
As I said before, that's quite natural when I think "the uffd-way", because
uffd can easily arm a page with read-only but while the page is dirty. I think
you'll answer that with "we should mark the page dirty instead" in this case,
as you stated below. I also agree. However if we see pte_dirty a major way to
track data dirty information, and at last when it'll be converged into the
PageDirty, I think it doesn't really matter a huge lot to us if we set pte or
page dirty, or is it?
>
> Is this a cleanup? No, it's a dirtyup.
>
> shmem_mfill_atomic_pte() does SetPageDirty (before unlocking page)
> because that's where the page contents are made dirty. You could
> criticise it for doing SetPageDirty even in the zeropage case:
> yes, we've been lazy there; but that's a different argument.
>
> If someone is faulting this page into a read-only vma, it's
> surprising to make the pte dirty there. What would be most correct
> would be to keep the SetPageDirty in shmem_mfill_atomic_pte()
> (with or without zeropage optimization), and probably SetPageDirty
> in some other places in mm/userfaultfd.c (I didn't look where) when
> the page is filled with supplied data, and mfill_atomic_install_pte()
> only do that pte_mkdirty() when it's serving a FAULT_FLAG_WRITE.
That's a good point, and yeah if we can unconditionally mark PageDirty it'll be
great too; I think what bothered me most in the past was that the condition to
check dirty is too complicated, for which myself has been debugging for two
cases where we should apply the dirty bit but we forgot; each of the debugging
process took me a few days or more to figure out, thanks to my awkward
debugging skills.
Then I noticed, why not we do the way around if for 99% of the cases they're
dirty in real systems? Say, let's set dirty unconditionally and see when there
(could have, which I still doubt) is a negative effect on having some page
dirty, we track that from a "degraded" performance results. Then we convert
some hard-to-debug data corrupt issues into "oh previously this programs runs
at speed 100x, now it runs 99x, why I got 1% performance lost?" I even highly
doubt whether it'll come true: for the uffd case (which is the only case I
modified in this patch), I can hardly tell how many people would like to use
the mappings read-only, and how much they'll suffer from that extra dirty bit
or PageDirty.
That's why I really like this patch to happen, I want to save time for myself,
and for anyone who will be fighting for another dirty lost issues.
>
> I haven't looked again (I have a pile of mails to respond to!),
> but when I looked before I think I found that the vmf flags are
> not available to the userfaultfd ioctler. If so, then it would
> be more appropriate to just leave the mkdirty to the hardware on
> return from fault (except - and again I cannot spend time researching
> this - perhaps I'm too x86-centric, and there are other architectures
> on which the software *must* do the mkdirty fixup to avoid refaulting
> forever - though probably userfaultfd state would itself prevent that).
If it's based on the fact that we'll set PageDirty for file-backed, then it
looks okay, but not usre.
One thing to mention is pte_mkdirty() also counts in soft dirty by nature. I'm
imagining a program that was soft-dirty tracked and somehow using UFFDIO_COPY
as the major data filler (so the task itself may not write to the page directly
hence HW won't set dirty bit there). If with pte_mkdirty the other userspace
tracker with soft-dirty can still detect this, while with PageDirty I believe
it can't. From that POV I'm not sure whether I can say that as proactively
doing pte_mkdirty is a safer approach just in case such an use case exist, as
myself can't say they're illegal, so pte_dirty is a superset of PageDirty not
vice versa.
>
> But you seem to think that doing the dirtying in an unnatural place
> helps somehow; and for all I know, that may be so in your larger
> series, though this change certainly raises suspicions of that.
>
> I'm sorry to be so discouraging, but you have asked for my opinion,
> and here at last you have it. Not a NAK, but no enthusiasm at all.
Thanks a lot for still looking at these patches; even if most of them are
negative and they come a bit late for sure.. I still appreciate your time.
As you mentioned you're busy with all the things and I'm aware of it. And
that's really what this patch wants to achieve too - to save time for all,
where my point stands at "maintaining 100% accurate dirty bit does not worth it
here".
Thanks,
--
Peter Xu
Powered by blists - more mailing lists