[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20200921211744.24758-1-peterx@redhat.com>
Date: Mon, 21 Sep 2020 17:17:39 -0400
From: Peter Xu <peterx@...hat.com>
To: linux-mm@...ck.org, linux-kernel@...r.kernel.org
Cc: Jason Gunthorpe <jgg@...pe.ca>,
Andrew Morton <akpm@...ux-foundation.org>,
Jan Kara <jack@...e.cz>, Michal Hocko <mhocko@...e.com>,
Kirill Tkhai <ktkhai@...tuozzo.com>,
Kirill Shutemov <kirill@...temov.name>,
Hugh Dickins <hughd@...gle.com>, Peter Xu <peterx@...hat.com>,
Christoph Hellwig <hch@....de>,
Andrea Arcangeli <aarcange@...hat.com>,
John Hubbard <jhubbard@...dia.com>,
Oleg Nesterov <oleg@...hat.com>,
Leon Romanovsky <leonro@...dia.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Jann Horn <jannh@...gle.com>
Subject: [PATCH 0/5] mm: Break COW for pinned pages during fork()
Finally I start to post formal patches because it's growing. And also since
we've discussed quite some issues already, so I feel like it's clearer on what
we need to do, and how.
This series is majorly inspired by the previous discussion on the list [1],
starting from the report from Jason on the rdma test failure. Linus proposed
the solution, which seems to be a very nice approach to avoid the breakage of
userspace apps that didn't use MADV_DONTFORK properly before. More information
can be found in that thread too.
I believe the initial plan was to consider merging something like this for
rc7/rc8. However now I'm not sure due to the fact that the code change in
copy_pte_range() is probably more than expected, so it can be with some risk.
I'll leave this question to the reviewers...
I tested it myself with fork() after vfio pinning a bunch of device pages, and
I verified that the new copy pte logic worked as expected at least in the most
general path. However I didn't test thp case yet because afaict vfio does not
support thp backed dma pages. Luckily, the pmd/pud thp patch is much more
straightforward than the pte one, so hopefully it can be directly verified by
some code review plus some more heavy-weight rdma tests.
Patch 1: Introduce mm.has_pinned (as single patch as suggested by Jason)
Patch 2-3: Some slight rework on copy_page_range() path as preparation
Patch 4: Early cow solution for pte copy for pinned pages
Patch 5: Same as above, but for thp (pmd/pud).
Hugetlbfs fix is still missing, but as planned, that's not urgent so we can
work upon. Comments greatly welcomed.
Thanks.
Peter Xu (5):
mm: Introduce mm_struct.has_pinned
mm/fork: Pass new vma pointer into copy_page_range()
mm: Rework return value for copy_one_pte()
mm: Do early cow for pinned pages during fork() for ptes
mm/thp: Split huge pmds/puds if they're pinned when fork()
include/linux/mm.h | 2 +-
include/linux/mm_types.h | 10 ++
kernel/fork.c | 3 +-
mm/gup.c | 6 ++
mm/huge_memory.c | 26 +++++
mm/memory.c | 226 +++++++++++++++++++++++++++++++++++----
6 files changed, 248 insertions(+), 25 deletions(-)
--
2.26.2
Powered by blists - more mailing lists