[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <87sgdvbnj8.fsf@x220.int.ebiederm.org>
Date: Mon, 13 Jul 2020 07:35:39 -0500
From: ebiederm@...ssion.com (Eric W. Biederman)
To: Richard Weinberger <richard@....at>
Cc: linux-kernel@...r.kernel.org, tj@...nel.org,
gregkh@...uxfoundation.org, dan.j.williams@...el.com
Subject: Re: [PATCH] [RFC] kernfs: Allow vm_ops->close() if VMA is never split
Richard Weinberger <richard@....at> writes:
> 10 years ago commit a6849fa1f7d7 ("sysfs: Fail bin file mmap if vma close is implemented.")
> removed support for vm_ops->close() for mmap on sysfs.
> As far I understand the reason is that due to the wrapping in kernfs
> every VMA split operation needs to be tracked to call vm_ops->close()
> for all fragments. This is not feasible with reasonable effort.
>
> Since commit 31383c6865a5 ("mm, hugetlbfs: introduce ->split() to vm_operations_struct")
> we can get notified as soon a VMA is split, this can help to relax the restriction.
> So I propose to allow having a custom close under the condition that a
> VMA cannot get split.
The existence of split may help but your patch does not address the
problem.
The difficult to handle case is not split. Simply wrapping
vma->vm_ops->open and vma->vm_ops->close is enough to handle split. The
challenge is when the kernel revokes acess to a kernfs file. Revokes
happen when the module supporting a kernfs file is removed, or when a
piece of hardware is removed.
The code path for that revocation looks like:
kernfs_remove or kernfs_remove_self
__kernfs_remove
kernfs_drain
kernfs_drain_open_files
unmap_mapping_range
The fundamental problem is that at the time of unmap_mapping_range the
code if it is to implement vma->vm_ops->close properly needs to call
close on all of the vmas. The code can not wait longer as after
kernfs_remove returns the kernfs code makes the guarantee that the code
implementing vma->vm_ops->close has been removed from the kernel.
The technical challenge is how to implement walking the list of vmas and
call the vma->vops->close method on all of them at revoke time. Last
I looked taking the locks that are necessary to do that is not safe to
do when holding the locks that are held in the kernfs_remove path.
A lot has changed in kernfs and the mm subsystem since I examined it,
so it might be worth looking at this again.
So in short the problem is that the code does not exist to call close
when the kernfs file is revoked. Your patch below does not implement
that code.
Eric
>
> Signed-off-by: Richard Weinberger <richard@....at>
> ---
> fs/kernfs/file.c | 42 ++++++++++++++++++++++++++++++++++--------
> 1 file changed, 34 insertions(+), 8 deletions(-)
>
> diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
> index 06b342d8462b..82b09e72acff 100644
> --- a/fs/kernfs/file.c
> +++ b/fs/kernfs/file.c
> @@ -347,6 +347,38 @@ static void kernfs_vma_open(struct vm_area_struct *vma)
> kernfs_put_active(of->kn);
> }
>
> +static void kernfs_vma_close(struct vm_area_struct *vma)
> +{
> + struct file *file = vma->vm_file;
> + struct kernfs_open_file *of = kernfs_of(file);
> +
> + if (!of->vm_ops)
> + return;
> +
> + if (!kernfs_get_active(of->kn))
> + return;
> +
> + if (of->vm_ops->close)
> + of->vm_ops->close(vma);
> +
> + kernfs_put_active(of->kn);
> +}
> +
> +static int kernfs_vma_split(struct vm_area_struct *vma, unsigned long addr)
> +{
> + struct file *file = vma->vm_file;
> + struct kernfs_open_file *of = kernfs_of(file);
> +
> + /*
> + * We cannot keep track of split operations,
> + * so refuse splitting a VMA if someone uses close.
> + */
> + if (of->vm_ops && of->vm_ops->close)
> + return -EINVAL;
> +
> + return 0;
> +}
> +
> static vm_fault_t kernfs_vma_fault(struct vm_fault *vmf)
> {
> struct file *file = vmf->vma->vm_file;
> @@ -457,6 +489,8 @@ static struct mempolicy *kernfs_vma_get_policy(struct vm_area_struct *vma,
>
> static const struct vm_operations_struct kernfs_vm_ops = {
> .open = kernfs_vma_open,
> + .close = kernfs_vma_close,
> + .split = kernfs_vma_split,
> .fault = kernfs_vma_fault,
> .page_mkwrite = kernfs_vma_page_mkwrite,
> .access = kernfs_vma_access,
> @@ -505,14 +539,6 @@ static int kernfs_fop_mmap(struct file *file, struct vm_area_struct *vma)
> if (of->mmapped && of->vm_ops != vma->vm_ops)
> goto out_put;
>
> - /*
> - * It is not possible to successfully wrap close.
> - * So error if someone is trying to use close.
> - */
> - rc = -EINVAL;
> - if (vma->vm_ops && vma->vm_ops->close)
> - goto out_put;
> -
> rc = 0;
> of->mmapped = true;
> of->vm_ops = vma->vm_ops;
Powered by blists - more mailing lists