linux-kernel - Re: [PATCH v3 3/8] mm/rmap: Split try_to_munlock from try_to

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20210301161049.GJ4247@nvidia.com>
Date:   Mon, 1 Mar 2021 12:10:49 -0400
From:   Jason Gunthorpe <jgg@...dia.com>
To:     Alistair Popple <apopple@...dia.com>
CC:     <linux-mm@...ck.org>, <nouveau@...ts.freedesktop.org>,
        <bskeggs@...hat.com>, <akpm@...ux-foundation.org>,
        <linux-doc@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
        <dri-devel@...ts.freedesktop.org>, <jhubbard@...dia.com>,
        <rcampbell@...dia.com>, <jglisse@...hat.com>, <hch@...radead.org>,
        <daniel@...ll.ch>
Subject: Re: [PATCH v3 3/8] mm/rmap: Split try_to_munlock from try_to_unmap

On Fri, Feb 26, 2021 at 06:18:27PM +1100, Alistair Popple wrote:
> The behaviour of try_to_unmap_one() is difficult to follow because it
> performs different operations based on a fairly large set of flags used
> in different combinations.
> 
> TTU_MUNLOCK is one such flag. However it is exclusively used by
> try_to_munlock() which specifies no other flags. Therefore rather than
> overload try_to_unmap_one() with unrelated behaviour split this out into
> it's own function and remove the flag.
> 
> Signed-off-by: Alistair Popple <apopple@...dia.com>
> 
> 
> Given the comments on not needing to hold mmap_lock it was not 100% clear
> to me if it is safe to check vma->vma_flags & VM_LOCKED and if re-checking
> under the ptl was significant. I left the extra check in case it was, but
> it seems one of the checks is redunant as either the first check is racey
> or the second check is unneccsary.

The rmap doesn't hold the mmap_lock so I think both of these cases are
racey.

eg 

apply_vma_lock_flags()

	vma = find_vma(current->mm, start);
	if (!vma || vma->vm_start > start)
		return -ENOMEM;

	prev = vma->vm_prev;
	if (start > vma->vm_start)
		prev = vma;

	for (nstart = start ; ; ) {
		vm_flags_t newflags = vma->vm_flags & VM_LOCKED_CLEAR_MASK;

		newflags |= flags;
 [...]
mlock_fixup()
	/*
	 * vm_flags is protected by the mmap_lock held in write mode.
	 * It's okay if try_to_unmap_one unmaps a page just after we
	 * set VM_LOCKED, populate_vma_page_range will bring it back.
	 */

	if (lock)
		vma->vm_flags = newflags;
	else
               	vma->vm_flags &= VM_LOCKED_CLEAR_MASK;

Which is only done under the mmap_sem

> +static bool try_to_munlock_one(struct page *page, struct vm_area_struct *vma,
> +		     unsigned long address, void *arg)
> +{
> +	struct page_vma_mapped_walk pvmw = {
> +		.page = page,
> +		.vma = vma,
> +		.address = address,
> +	};
> +	bool ret = true;
> +
> +	/* munlock has nothing to gain from examining un-locked vmas */
> +	if (!(vma->vm_flags & VM_LOCKED))
> +		return true;

The mmap_sem can't be obtained in the rmap walkers due to lock
ordering, the various rmap locks are nested under the mmap_sem

So, when reading data that is not locked it should be written as:

   READ_ONCE(vma->vm_flags) & VM_LOCKED

> +	while (page_vma_mapped_walk(&pvmw)) {
> +		/*
> +		 * If the page is mlock()d, we cannot swap it out.
> +		 * If it's recently referenced (perhaps page_referenced
> +		 * skipped over this mm) then we should reactivate it.
> +		 */
> +		if (vma->vm_flags & VM_LOCKED) {

And since we write the data without holding the PTLs this looks
pointless, unless there is some other VM_LOCKED manipulation

Jason