lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 31 Aug 2017 01:01:25 +0200
From:   Andrea Arcangeli <aarcange@...hat.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Jérôme Glisse <jglisse@...hat.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        linux-mm <linux-mm@...ck.org>,
        Dan Williams <dan.j.williams@...el.com>,
        Ross Zwisler <ross.zwisler@...ux.intel.com>,
        Bernhard Held <berny156@....de>,
        Adam Borowski <kilobyte@...band.pl>,
        Radim Krčmář <rkrcmar@...hat.com>,
        Wanpeng Li <kernellwp@...il.com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Takashi Iwai <tiwai@...e.de>,
        Nadav Amit <nadav.amit@...il.com>,
        Mike Galbraith <efault@....de>,
        "Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
        axie <axie@....com>, Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH 02/13] mm/rmap: update to new mmu_notifier semantic

On Wed, Aug 30, 2017 at 02:53:38PM -0700, Linus Torvalds wrote:
> On Wed, Aug 30, 2017 at 9:52 AM, Andrea Arcangeli <aarcange@...hat.com> wrote:
> >
> > I pointed out in earlier email ->invalidate_range can only be
> > implemented (as mutually exclusive alternative to
> > ->invalidate_range_start/end) by secondary MMUs that shares the very
> > same pagetables with the core linux VM of the primary MMU, and those
> > invalidate_range are already called by
> > __mmu_notifier_invalidate_range_end.
> 
> I have to admit that I didn't notice that fact - that we are already
> in the situation that
> invalidate_range is called by by the rand_end() nofifier.
> 
> I agree that that should simplify all the code, and means that we
> don't have to worry about the few cases that already implemented only
> the "invalidate_page()" and "invalidate_range()" cases.
> 
> So I think that simplifies Jérôme's patch further - once you have put
> the range_start/end() cases around the inner loop, you can just drop
> the invalidate_page() things entirely.
> 
> > So this conversion from invalidate_page to invalidate_range looks
> > superflous and the final mmu_notifier_invalidate_range_end should be
> > enough.
> 
> Yes. I missed the fact that we already called range() from range_end().
> 
> That said, the double call shouldn't hurt correctness, and it's
> "closer" to old behavior for those people who only did the range/page
> ones, so I wonder if we can keep Jérôme's patch in its current state
> for 4.13.

Yes, the double call doesn't hurt correctness. Keeping it in current
state is safer if something, so I've no objection to it other than I'd
like to optimize it further if possible, but it can be done later.

We're already running the double call in various fast paths too in
fact, and rmap walk isn't the fastest path that would be doing such
double call, so it's not a major concern.

Also not a bug, but one further (but more obviously safe) enhancement
I would like is to restrict those rmap invalidation ranges to
PAGE_SIZE << compound_order(page) instead of PMD_SIZE/PMD_MASK.

+	/*
+	 * We have to assume the worse case ie pmd for invalidation. Note that
+	 * the page can not be free in this function as call of try_to_unmap()
+	 * must hold a reference on the page.
+	 */
+	end = min(vma->vm_end, (start & PMD_MASK) + PMD_SIZE);
+	mmu_notifier_invalidate_range_start(vma->vm_mm, start, end);

We don't need to invalidate 2MB of secondary MMU mappings surrounding
a 4KB page, just to swapout a 4k page. split_huge_page can't run while
holding the rmap locks, so compound_order(page) is safe to use there.

It can also be optimized incrementally later.

> Because I still want to release 4.13 this weekend, despite this
> upheaval. Otherwise I'll have timing problems during the next merge
> window.
> 
> Andrea, do you otherwise agree with the whole series as is?

I only wish we had more time to test Jerome's patchset, but I sure
agree in principle and I don't see regressions in it.

The callouts to ->invalidate_page seems to have diminished over time
(for the various reasons we know) so if we don't use it for the fast
paths, using it only in rmap walk slow paths probably wasn't providing
much performance benefit.

Thanks,
Andrea

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ