lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <87sgdvbnj8.fsf@x220.int.ebiederm.org>
Date:   Mon, 13 Jul 2020 07:35:39 -0500
From:   ebiederm@...ssion.com (Eric W. Biederman)
To:     Richard Weinberger <richard@....at>
Cc:     linux-kernel@...r.kernel.org, tj@...nel.org,
        gregkh@...uxfoundation.org, dan.j.williams@...el.com
Subject: Re: [PATCH] [RFC] kernfs: Allow vm_ops->close() if VMA is never split

Richard Weinberger <richard@....at> writes:

> 10 years ago commit a6849fa1f7d7 ("sysfs: Fail bin file mmap if vma close is implemented.")
> removed support for vm_ops->close() for mmap on sysfs.
> As far I understand the reason is that due to the wrapping in kernfs
> every VMA split operation needs to be tracked to call vm_ops->close()
> for all fragments. This is not feasible with reasonable effort.
>
> Since commit 31383c6865a5 ("mm, hugetlbfs: introduce ->split() to vm_operations_struct")
> we can get notified as soon a VMA is split, this can help to relax the restriction.
> So I propose to allow having a custom close under the condition that a
> VMA cannot get split.

The existence of split may help but your patch does not address the
problem.

The difficult to handle case is not split.  Simply wrapping
vma->vm_ops->open and vma->vm_ops->close is enough to handle split.  The
challenge is when the kernel revokes acess to a kernfs file.  Revokes
happen when the module supporting a kernfs file is removed, or when a
piece of hardware is removed.

The code path for that revocation looks like:
	kernfs_remove or kernfs_remove_self
		__kernfs_remove
			kernfs_drain
        			kernfs_drain_open_files
                			unmap_mapping_range


The fundamental problem is that at the time of unmap_mapping_range the
code if it is to implement vma->vm_ops->close properly needs to call
close on all of the vmas.  The code can not wait longer as after
kernfs_remove returns the kernfs code makes the guarantee that the code
implementing vma->vm_ops->close has been removed from the kernel.

The technical challenge is how to implement walking the list of vmas and
call the vma->vops->close method on all of them at revoke time.  Last
I looked taking the locks that are necessary to do that is not safe to
do when holding the locks that are held in the kernfs_remove path.

A lot has changed in kernfs and the mm subsystem since I examined it,
so it might be worth looking at this again.

So in short the problem is that the code does not exist to call close
when the kernfs file is revoked.  Your patch below does not implement
that code.

Eric

>
> Signed-off-by: Richard Weinberger <richard@....at>
> ---
>  fs/kernfs/file.c | 42 ++++++++++++++++++++++++++++++++++--------
>  1 file changed, 34 insertions(+), 8 deletions(-)
>
> diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
> index 06b342d8462b..82b09e72acff 100644
> --- a/fs/kernfs/file.c
> +++ b/fs/kernfs/file.c
> @@ -347,6 +347,38 @@ static void kernfs_vma_open(struct vm_area_struct *vma)
>  	kernfs_put_active(of->kn);
>  }
>  
> +static void kernfs_vma_close(struct vm_area_struct *vma)
> +{
> +	struct file *file = vma->vm_file;
> +	struct kernfs_open_file *of = kernfs_of(file);
> +
> +	if (!of->vm_ops)
> +		return;
> +
> +	if (!kernfs_get_active(of->kn))
> +		return;
> +
> +	if (of->vm_ops->close)
> +		of->vm_ops->close(vma);
> +
> +	kernfs_put_active(of->kn);
> +}
> +
> +static int kernfs_vma_split(struct vm_area_struct *vma, unsigned long addr)
> +{
> +	struct file *file = vma->vm_file;
> +	struct kernfs_open_file *of = kernfs_of(file);
> +
> +	/*
> +	 * We cannot keep track of split operations,
> +	 * so refuse splitting a VMA if someone uses close.
> +	 */
> +	if (of->vm_ops && of->vm_ops->close)
> +		return -EINVAL;
> +
> +	return 0;
> +}
> +
>  static vm_fault_t kernfs_vma_fault(struct vm_fault *vmf)
>  {
>  	struct file *file = vmf->vma->vm_file;
> @@ -457,6 +489,8 @@ static struct mempolicy *kernfs_vma_get_policy(struct vm_area_struct *vma,
>  
>  static const struct vm_operations_struct kernfs_vm_ops = {
>  	.open		= kernfs_vma_open,
> +	.close		= kernfs_vma_close,
> +	.split		= kernfs_vma_split,
>  	.fault		= kernfs_vma_fault,
>  	.page_mkwrite	= kernfs_vma_page_mkwrite,
>  	.access		= kernfs_vma_access,
> @@ -505,14 +539,6 @@ static int kernfs_fop_mmap(struct file *file, struct vm_area_struct *vma)
>  	if (of->mmapped && of->vm_ops != vma->vm_ops)
>  		goto out_put;
>  
> -	/*
> -	 * It is not possible to successfully wrap close.
> -	 * So error if someone is trying to use close.
> -	 */
> -	rc = -EINVAL;
> -	if (vma->vm_ops && vma->vm_ops->close)
> -		goto out_put;
> -
>  	rc = 0;
>  	of->mmapped = true;
>  	of->vm_ops = vma->vm_ops;

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ