lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200820125855.GL1902@quack2.suse.cz>
Date:   Thu, 20 Aug 2020 14:58:55 +0200
From:   Jan Kara <jack@...e.cz>
To:     Vivek Goyal <vgoyal@...hat.com>
Cc:     linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-nvdimm@...ts.01.org, virtio-fs@...hat.com, miklos@...redi.hu,
        stefanha@...hat.com, dgilbert@...hat.com, dan.j.williams@...el.com,
        Jan Kara <jack@...e.cz>,
        Vishal L Verma <vishal.l.verma@...el.com>,
        "Weiny, Ira" <ira.weiny@...el.com>
Subject: Re: [PATCH v3 02/18] dax: Create a range version of
 dax_layout_busy_page()

On Wed 19-08-20 18:19:40, Vivek Goyal wrote:
> virtiofs device has a range of memory which is mapped into file inodes
> using dax. This memory is mapped in qemu on host and maps different
> sections of real file on host. Size of this memory is limited
> (determined by administrator) and depending on filesystem size, we will
> soon reach a situation where all the memory is in use and we need to
> reclaim some.
> 
> As part of reclaim process, we will need to make sure that there are
> no active references to pages (taken by get_user_pages()) on the memory
> range we are trying to reclaim. I am planning to use
> dax_layout_busy_page() for this. But in current form this is per inode
> and scans through all the pages of the inode.
> 
> We want to reclaim only a portion of memory (say 2MB page). So we want
> to make sure that only that 2MB range of pages do not have any
> references  (and don't want to unmap all the pages of inode).
> 
> Hence, create a range version of this function named
> dax_layout_busy_page_range() which can be used to pass a range which
> needs to be unmapped.
> 
> Cc: Dan Williams <dan.j.williams@...el.com>
> Cc: linux-nvdimm@...ts.01.org
> Cc: Jan Kara <jack@...e.cz>
> Cc: Vishal L Verma <vishal.l.verma@...el.com>
> Cc: "Weiny, Ira" <ira.weiny@...el.com>
> Signed-off-by: Vivek Goyal <vgoyal@...hat.com>
> ---
>  fs/dax.c            | 29 +++++++++++++++++++++++------
>  include/linux/dax.h |  6 ++++++
>  2 files changed, 29 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index 95341af1a966..ddd705251d9f 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -559,7 +559,7 @@ static void *grab_mapping_entry(struct xa_state *xas,
>  }
>  
>  /**
> - * dax_layout_busy_page - find first pinned page in @mapping
> + * dax_layout_busy_page_range - find first pinned page in @mapping
>   * @mapping: address space to scan for a page with ref count > 1

Please document additional function arguments in the kernel-doc comment.

Otherwise the patch looks good so feel free to add:

Reviewed-by: Jan Kara <jack@...e.cz>

after fixing this nit.

								Honza

>   *
>   * DAX requires ZONE_DEVICE mapped pages. These pages are never
> @@ -572,13 +572,19 @@ static void *grab_mapping_entry(struct xa_state *xas,
>   * establishment of new mappings in this address_space. I.e. it expects
>   * to be able to run unmap_mapping_range() and subsequently not race
>   * mapping_mapped() becoming true.
> + *
> + * Partial pages are included. If 'end' is LLONG_MAX, pages in the range
> + * from 'start' to end of the file are inluded.
>   */
> -struct page *dax_layout_busy_page(struct address_space *mapping)
> +struct page *dax_layout_busy_page_range(struct address_space *mapping,
> +					loff_t start, loff_t end)
>  {
> -	XA_STATE(xas, &mapping->i_pages, 0);
>  	void *entry;
>  	unsigned int scanned = 0;
>  	struct page *page = NULL;
> +	pgoff_t start_idx = start >> PAGE_SHIFT;
> +	pgoff_t end_idx;
> +	XA_STATE(xas, &mapping->i_pages, start_idx);
>  
>  	/*
>  	 * In the 'limited' case get_user_pages() for dax is disabled.
> @@ -589,6 +595,11 @@ struct page *dax_layout_busy_page(struct address_space *mapping)
>  	if (!dax_mapping(mapping) || !mapping_mapped(mapping))
>  		return NULL;
>  
> +	/* If end == LLONG_MAX, all pages from start to till end of file */
> +	if (end == LLONG_MAX)
> +		end_idx = ULONG_MAX;
> +	else
> +		end_idx = end >> PAGE_SHIFT;
>  	/*
>  	 * If we race get_user_pages_fast() here either we'll see the
>  	 * elevated page count in the iteration and wait, or
> @@ -596,15 +607,15 @@ struct page *dax_layout_busy_page(struct address_space *mapping)
>  	 * against is no longer mapped in the page tables and bail to the
>  	 * get_user_pages() slow path.  The slow path is protected by
>  	 * pte_lock() and pmd_lock(). New references are not taken without
> -	 * holding those locks, and unmap_mapping_range() will not zero the
> +	 * holding those locks, and unmap_mapping_pages() will not zero the
>  	 * pte or pmd without holding the respective lock, so we are
>  	 * guaranteed to either see new references or prevent new
>  	 * references from being established.
>  	 */
> -	unmap_mapping_range(mapping, 0, 0, 0);
> +	unmap_mapping_pages(mapping, start_idx, end_idx - start_idx + 1, 0);
>  
>  	xas_lock_irq(&xas);
> -	xas_for_each(&xas, entry, ULONG_MAX) {
> +	xas_for_each(&xas, entry, end_idx) {
>  		if (WARN_ON_ONCE(!xa_is_value(entry)))
>  			continue;
>  		if (unlikely(dax_is_locked(entry)))
> @@ -625,6 +636,12 @@ struct page *dax_layout_busy_page(struct address_space *mapping)
>  	xas_unlock_irq(&xas);
>  	return page;
>  }
> +EXPORT_SYMBOL_GPL(dax_layout_busy_page_range);
> +
> +struct page *dax_layout_busy_page(struct address_space *mapping)
> +{
> +	return dax_layout_busy_page_range(mapping, 0, LLONG_MAX);
> +}
>  EXPORT_SYMBOL_GPL(dax_layout_busy_page);
>  
>  static int __dax_invalidate_entry(struct address_space *mapping,
> diff --git a/include/linux/dax.h b/include/linux/dax.h
> index 6904d4e0b2e0..9016929db4c6 100644
> --- a/include/linux/dax.h
> +++ b/include/linux/dax.h
> @@ -141,6 +141,7 @@ int dax_writeback_mapping_range(struct address_space *mapping,
>  		struct dax_device *dax_dev, struct writeback_control *wbc);
>  
>  struct page *dax_layout_busy_page(struct address_space *mapping);
> +struct page *dax_layout_busy_page_range(struct address_space *mapping, loff_t start, loff_t end);
>  dax_entry_t dax_lock_page(struct page *page);
>  void dax_unlock_page(struct page *page, dax_entry_t cookie);
>  #else
> @@ -171,6 +172,11 @@ static inline struct page *dax_layout_busy_page(struct address_space *mapping)
>  	return NULL;
>  }
>  
> +static inline struct page *dax_layout_busy_page_range(struct address_space *mapping, pgoff_t start, pgoff_t nr_pages)
> +{
> +	return NULL;
> +}
> +
>  static inline int dax_writeback_mapping_range(struct address_space *mapping,
>  		struct dax_device *dax_dev, struct writeback_control *wbc)
>  {
> -- 
> 2.25.4
> 
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ