linux-kernel - Re: [PATCH v6 5/8] mm: Device exclusive memory access

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <6616451.iqfUG9VtI1@nvdebian>
Date:   Mon, 22 Mar 2021 21:27:46 +1100
From:   Alistair Popple <apopple@...dia.com>
To:     Christoph Hellwig <hch@...radead.org>
CC:     <linux-mm@...ck.org>, <nouveau@...ts.freedesktop.org>,
        <bskeggs@...hat.com>, <akpm@...ux-foundation.org>,
        <linux-doc@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
        <kvm-ppc@...r.kernel.org>, <dri-devel@...ts.freedesktop.org>,
        <jhubbard@...dia.com>, <rcampbell@...dia.com>,
        <jglisse@...hat.com>, <jgg@...dia.com>, <daniel@...ll.ch>,
        <willy@...radead.org>
Subject: Re: [PATCH v6 5/8] mm: Device exclusive memory access

On Monday, 15 March 2021 6:42:45 PM AEDT Christoph Hellwig wrote:
> > +Not all devices support atomic access to system memory. To support atomic
> > +operations to a shared virtual memory page such a device needs access to 
that
> > +page which is exclusive of any userspace access from the CPU. The
> > +``make_device_exclusive_range()`` function can be used to make a memory 
range
> > +inaccessible from userspace.
> 
> s/Not all devices/Some devices/ ?

I will reword this. What I was trying to convey is that devices may have 
features which allow for atomics to be implemented with SW assistance.

> >  static inline int mm_has_notifiers(struct mm_struct *mm)
> > @@ -528,7 +534,17 @@ static inline void mmu_notifier_range_init_migrate(
> >  {
> >  	mmu_notifier_range_init(range, MMU_NOTIFY_MIGRATE, flags, vma, mm,
> >  				start, end);
> > -	range->migrate_pgmap_owner = pgmap;
> > +	range->owner = pgmap;
> > +}
> > +
> > +static inline void mmu_notifier_range_init_exclusive(
> > +			struct mmu_notifier_range *range, unsigned int flags,
> > +			struct vm_area_struct *vma, struct mm_struct *mm,
> > +			unsigned long start, unsigned long end, void *owner)
> > +{
> > +	mmu_notifier_range_init(range, MMU_NOTIFY_EXCLUSIVE, flags, vma, mm,
> > +				start, end);
> > +	range->owner = owner;
> 
> Maybe just replace mmu_notifier_range_init_migrate with a
> mmu_notifier_range_init_owner helper that takes the owner but does
> not hard code a type?

Ok. That does result in a function which takes a fair number of arguments, but 
I guess that's no worse than multiple functions hard coding the different 
types and it does result in less code overall.

> >  		}
> > +	} else if (is_device_exclusive_entry(entry)) {
> > +		page = pfn_swap_entry_to_page(entry);
> > +
> > +		get_page(page);
> > +		rss[mm_counter(page)]++;
> > +
> > +		if (is_writable_device_exclusive_entry(entry) &&
> > +		    is_cow_mapping(vm_flags)) {
> > +			/*
> > +			 * COW mappings require pages in both
> > +			 * parent and child to be set to read.
> > +			 */
> > +			entry = make_readable_device_exclusive_entry(
> > +							swp_offset(entry));
> > +			pte = swp_entry_to_pte(entry);
> > +			if (pte_swp_soft_dirty(*src_pte))
> > +				pte = pte_swp_mksoft_dirty(pte);
> > +			if (pte_swp_uffd_wp(*src_pte))
> > +				pte = pte_swp_mkuffd_wp(pte);
> > +			set_pte_at(src_mm, addr, src_pte, pte);
> > +		}
> 
> Just cosmetic, but I wonder if should factor this code block into
> a little helper.

In that case there are arguably are other bits of this function which should 
be refactored into helpers as well. Unless you feel strongly about it I would 
like to leave this as is and put together a future series to fix this and a 
couple of other areas I've noticed that could do with some refactoring/clean 
ups.

> > +
> > +static bool try_to_protect_one(struct page *page, struct vm_area_struct 
*vma,
> > +			unsigned long address, void *arg)
> > +{
> > +	struct mm_struct *mm = vma->vm_mm;
> > +	struct page_vma_mapped_walk pvmw = {
> > +		.page = page,
> > +		.vma = vma,
> > +		.address = address,
> > +	};
> > +	struct ttp_args *ttp = (struct ttp_args *) arg;
> 
> This cast should not be needed.
> 
> > +	return ttp.valid && (!page_mapcount(page) ? true : false);
> 
> This can be simplified to:
> 
> 	return ttp.valid && !page_mapcount(page);
> 
> > +	npages = get_user_pages_remote(mm, start, npages,
> > +				       FOLL_GET | FOLL_WRITE | FOLL_SPLIT_PMD,
> > +				       pages, NULL, NULL);
> > +	for (i = 0; i < npages; i++, start += PAGE_SIZE) {
> > +		if (!trylock_page(pages[i])) {
> > +			put_page(pages[i]);
> > +			pages[i] = NULL;
> > +			continue;
> > +		}
> > +
> > +		if (!try_to_protect(pages[i], mm, start, arg)) {
> > +			unlock_page(pages[i]);
> > +			put_page(pages[i]);
> > +			pages[i] = NULL;
> > +		}
> 
> Should the trylock_page go into try_to_protect to simplify the loop
> a little?  Also I wonder if we need make_device_exclusive_range or
> should just open code the get_user_pages_remote + try_to_protect
> loop in the callers, as that might allow them to also deduct other
> information about the found pages.

This function has evolved over time and putting the trylock_page into 
try_to_protect does simplify things nicely. I'm not sure what other 
information a caller could deduct through open coding though, but I guess in 
some circumstances it might be possible for callers to skip 
get_user_pages_remote() which might be a future improvement.

The main reason it looks like this was simply to keep it looking fairly 
similar to how hmm_range_fault() and migrate_vma() are used with an array of 
pages (or pfns) which are filled out from the given address range.
 
> Otherwise looks good:
> 
> Reviewed-by: Christoph Hellwig <hch@....de>
> 

Thanks.