lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4b599782-3512-a177-c5b5-c562a22886c7@redhat.com>
Date:   Mon, 24 Apr 2023 06:41:38 +0300
From:   Mika Penttilä <mpenttil@...hat.com>
To:     Lorenzo Stoakes <lstoakes@...il.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org,
        Andrew Morton <akpm@...ux-foundation.org>
Cc:     Jason Gunthorpe <jgg@...pe.ca>, Jens Axboe <axboe@...nel.dk>,
        Matthew Wilcox <willy@...radead.org>,
        Dennis Dalessandro <dennis.dalessandro@...nelisnetworks.com>,
        Leon Romanovsky <leon@...nel.org>,
        Christian Benvenuti <benve@...co.com>,
        Nelson Escobar <neescoba@...co.com>,
        Bernard Metzler <bmt@...ich.ibm.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Mark Rutland <mark.rutland@....com>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Jiri Olsa <jolsa@...nel.org>,
        Namhyung Kim <namhyung@...nel.org>,
        Ian Rogers <irogers@...gle.com>,
        Adrian Hunter <adrian.hunter@...el.com>,
        Bjorn Topel <bjorn@...nel.org>,
        Magnus Karlsson <magnus.karlsson@...el.com>,
        Maciej Fijalkowski <maciej.fijalkowski@...el.com>,
        Jonathan Lemon <jonathan.lemon@...il.com>,
        "David S . Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>,
        Christian Brauner <brauner@...nel.org>,
        Richard Cochran <richardcochran@...il.com>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Jesper Dangaard Brouer <hawk@...nel.org>,
        John Fastabend <john.fastabend@...il.com>,
        linux-fsdevel@...r.kernel.org, linux-perf-users@...r.kernel.org,
        netdev@...r.kernel.org, bpf@...r.kernel.org
Subject: Re: [PATCH] mm/gup: disallow GUP writing to file-backed mappings by
 default


Hi,


On 22.4.2023 16.37, Lorenzo Stoakes wrote:
> It isn't safe to write to file-backed mappings as GUP does not ensure that
> the semantics associated with such a write are performed correctly, for
> instance filesystems which rely upon write-notify will not be correctly
> notified.
> 
> There are exceptions to this - shmem and hugetlb mappings are (in effect)
> anonymous mappings by other names so we do permit this operation in these
> cases.
> 
> In addition, if no pinning takes place (neither FOLL_GET nor FOLL_PIN is
> specified and neither flags gets implicitly set) then no writing can occur
> so we do not perform the check in this instance.
> 
> This is an important exception, as populate_vma_page_range() invokes
> __get_user_pages() in this way (and thus so does __mm_populate(), used by
> MAP_POPULATE mmap() and mlock() invocations).
> 
> There are GUP users within the kernel that do nevertheless rely upon this
> behaviour, so we introduce the FOLL_ALLOW_BROKEN_FILE_MAPPING flag to
> explicitly permit this kind of GUP access.
> 
> This is required in order to not break userspace in instances where the
> uAPI might permit file-mapped addresses - a number of RDMA users require
> this for instance, as do the process_vm_[read/write]v() system calls,
> /proc/$pid/mem, ptrace and SDT uprobes. Each of these callers have been
> updated to use this flag.
> 
> Making this change is an important step towards a more reliable GUP, and
> explicitly indicates which callers might encouter issues moving forward.
> 
> Suggested-by: Jason Gunthorpe <jgg@...pe.ca>
> Signed-off-by: Lorenzo Stoakes <lstoakes@...il.com>
> ---
>   drivers/infiniband/hw/qib/qib_user_pages.c |  3 +-
>   drivers/infiniband/hw/usnic/usnic_uiom.c   |  2 +-
>   drivers/infiniband/sw/siw/siw_mem.c        |  3 +-
>   fs/proc/base.c                             |  3 +-
>   include/linux/mm_types.h                   |  8 +++++
>   kernel/events/uprobes.c                    |  3 +-
>   mm/gup.c                                   | 36 +++++++++++++++++++++-
>   mm/memory.c                                |  3 +-
>   mm/process_vm_access.c                     |  2 +-
>   net/xdp/xdp_umem.c                         |  2 +-
>   10 files changed, 56 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c b/drivers/infiniband/hw/qib/qib_user_pages.c
> index f693bc753b6b..b9019dad8008 100644
> --- a/drivers/infiniband/hw/qib/qib_user_pages.c
> +++ b/drivers/infiniband/hw/qib/qib_user_pages.c
> @@ -110,7 +110,8 @@ int qib_get_user_pages(unsigned long start_page, size_t num_pages,
>   	for (got = 0; got < num_pages; got += ret) {
>   		ret = pin_user_pages(start_page + got * PAGE_SIZE,
>   				     num_pages - got,
> -				     FOLL_LONGTERM | FOLL_WRITE,
> +				     FOLL_LONGTERM | FOLL_WRITE |
> +				     FOLL_ALLOW_BROKEN_FILE_MAPPING,
>   				     p + got, NULL);
>   		if (ret < 0) {
>   			mmap_read_unlock(current->mm);
> diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.c b/drivers/infiniband/hw/usnic/usnic_uiom.c
> index 2a5cac2658ec..33cf79b248a9 100644
> --- a/drivers/infiniband/hw/usnic/usnic_uiom.c
> +++ b/drivers/infiniband/hw/usnic/usnic_uiom.c
> @@ -85,7 +85,7 @@ static int usnic_uiom_get_pages(unsigned long addr, size_t size, int writable,
>   				int dmasync, struct usnic_uiom_reg *uiomr)
>   {
>   	struct list_head *chunk_list = &uiomr->chunk_list;
> -	unsigned int gup_flags = FOLL_LONGTERM;
> +	unsigned int gup_flags = FOLL_LONGTERM | FOLL_ALLOW_BROKEN_FILE_MAPPING;
>   	struct page **page_list;
>   	struct scatterlist *sg;
>   	struct usnic_uiom_chunk *chunk;
> diff --git a/drivers/infiniband/sw/siw/siw_mem.c b/drivers/infiniband/sw/siw/siw_mem.c
> index f51ab2ccf151..bc3e8c0898e5 100644
> --- a/drivers/infiniband/sw/siw/siw_mem.c
> +++ b/drivers/infiniband/sw/siw/siw_mem.c
> @@ -368,7 +368,8 @@ struct siw_umem *siw_umem_get(u64 start, u64 len, bool writable)
>   	struct mm_struct *mm_s;
>   	u64 first_page_va;
>   	unsigned long mlock_limit;
> -	unsigned int foll_flags = FOLL_LONGTERM;
> +	unsigned int foll_flags =
> +		FOLL_LONGTERM | FOLL_ALLOW_BROKEN_FILE_MAPPING;
>   	int num_pages, num_chunks, i, rv = 0;
>   
>   	if (!can_do_mlock())
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index 96a6a08c8235..3e3f5ea9849f 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -855,7 +855,8 @@ static ssize_t mem_rw(struct file *file, char __user *buf,
>   	if (!mmget_not_zero(mm))
>   		goto free;
>   
> -	flags = FOLL_FORCE | (write ? FOLL_WRITE : 0);
> +	flags = FOLL_FORCE | FOLL_ALLOW_BROKEN_FILE_MAPPING |
> +		(write ? FOLL_WRITE : 0);
>   
>   	while (count > 0) {
>   		size_t this_len = min_t(size_t, count, PAGE_SIZE);
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 3fc9e680f174..e76637b4c78f 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -1185,6 +1185,14 @@ enum {
>   	FOLL_PCI_P2PDMA = 1 << 10,
>   	/* allow interrupts from generic signals */
>   	FOLL_INTERRUPTIBLE = 1 << 11,
> +	/*
> +	 * By default we disallow write access to known broken file-backed
> +	 * memory mappings (i.e. anything other than hugetlb/shmem
> +	 * mappings). Some code may rely upon being able to access this
> +	 * regardless for legacy reasons, thus we provide a flag to indicate
> +	 * this.
> +	 */
> +	FOLL_ALLOW_BROKEN_FILE_MAPPING = 1 << 12,
>   
>   	/* See also internal only FOLL flags in mm/internal.h */
>   };
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index 59887c69d54c..ec330d3b0218 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -373,7 +373,8 @@ __update_ref_ctr(struct mm_struct *mm, unsigned long vaddr, short d)
>   		return -EINVAL;
>   
>   	ret = get_user_pages_remote(mm, vaddr, 1,
> -			FOLL_WRITE, &page, &vma, NULL);
> +				    FOLL_WRITE | FOLL_ALLOW_BROKEN_FILE_MAPPING,
> +				    &page, &vma, NULL);
>   	if (unlikely(ret <= 0)) {
>   		/*
>   		 * We are asking for 1 page. If get_user_pages_remote() fails,
> diff --git a/mm/gup.c b/mm/gup.c
> index 1f72a717232b..68d5570c0bae 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -959,16 +959,46 @@ static int faultin_page(struct vm_area_struct *vma,
>   	return 0;
>   }
>   
> +/*
> + * Writing to file-backed mappings using GUP is a fundamentally broken operation
> + * as kernel write access to GUP mappings may not adhere to the semantics
> + * expected by a file system.
> + *
> + * In most instances we disallow this broken behaviour, however there are some
> + * exceptions to this enforced here.
> + */
> +static inline bool can_write_file_mapping(struct vm_area_struct *vma,
> +					  unsigned long gup_flags)
> +{
> +	struct file *file = vma->vm_file;
> +
> +	/* If we aren't pinning then no problematic write can occur. */
> +	if (!(gup_flags & (FOLL_GET | FOLL_PIN)))
> +		return true;
> +
> +	/* Special mappings should pose no problem. */
> +	if (!file)
> +		return true;
> +
> +	/* Has the caller explicitly indicated this case is acceptable? */
> +	if (gup_flags & FOLL_ALLOW_BROKEN_FILE_MAPPING)
> +		return true;
> +
> +	/* shmem and hugetlb mappings do not have problematic semantics. */
> +	return vma_is_shmem(vma) || is_file_hugepages(file);
> +}
> +
>   static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags)
>   {
>   	vm_flags_t vm_flags = vma->vm_flags;
>   	int write = (gup_flags & FOLL_WRITE);
>   	int foreign = (gup_flags & FOLL_REMOTE);
> +	bool vma_anon = vma_is_anonymous(vma);
>   
>   	if (vm_flags & (VM_IO | VM_PFNMAP))
>   		return -EFAULT;
>   
> -	if (gup_flags & FOLL_ANON && !vma_is_anonymous(vma))
> +	if ((gup_flags & FOLL_ANON) && !vma_anon)
>   		return -EFAULT;
>   
>   	if ((gup_flags & FOLL_LONGTERM) && vma_is_fsdax(vma))
> @@ -978,6 +1008,10 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags)
>   		return -EFAULT;
>   
>   	if (write) {
> +		if (!vma_anon &&
> +		    WARN_ON_ONCE(!can_write_file_mapping(vma, gup_flags)))
> +			return -EFAULT;
> +
>   		if (!(vm_flags & VM_WRITE)) {
>   			if (!(gup_flags & FOLL_FORCE))
>   				return -EFAULT;
> diff --git a/mm/memory.c b/mm/memory.c
> index 146bb94764f8..e3d535991548 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -5683,7 +5683,8 @@ int access_process_vm(struct task_struct *tsk, unsigned long addr,
>   	if (!mm)
>   		return 0;
>   
> -	ret = __access_remote_vm(mm, addr, buf, len, gup_flags);
> +	ret = __access_remote_vm(mm, addr, buf, len,
> +				 gup_flags | FOLL_ALLOW_BROKEN_FILE_MAPPING);
>   
>   	mmput(mm);
>   
> diff --git a/mm/process_vm_access.c b/mm/process_vm_access.c
> index 78dfaf9e8990..ef126c08e89c 100644
> --- a/mm/process_vm_access.c
> +++ b/mm/process_vm_access.c
> @@ -81,7 +81,7 @@ static int process_vm_rw_single_vec(unsigned long addr,
>   	ssize_t rc = 0;
>   	unsigned long max_pages_per_loop = PVM_MAX_KMALLOC_PAGES
>   		/ sizeof(struct pages *);
> -	unsigned int flags = 0;
> +	unsigned int flags = FOLL_ALLOW_BROKEN_FILE_MAPPING;
>   
>   	/* Work out address and page range required */
>   	if (len == 0)
> diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
> index 02207e852d79..b93cfcaccb0d 100644
> --- a/net/xdp/xdp_umem.c
> +++ b/net/xdp/xdp_umem.c
> @@ -93,7 +93,7 @@ void xdp_put_umem(struct xdp_umem *umem, bool defer_cleanup)
>   
>   static int xdp_umem_pin_pages(struct xdp_umem *umem, unsigned long address)
>   {
> -	unsigned int gup_flags = FOLL_WRITE;
> +	unsigned int gup_flags = FOLL_WRITE | FOLL_ALLOW_BROKEN_FILE_MAPPING;
>   	long npgs;
>   	int err;
>   

Not sure about this in general, but seemss at least ptrace 
(ptrace_access_vm()) seems to be broken here..


--Mika


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ