[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+EHjTyphrouY1FV2NQOBLDG81JYhiHFGBNKjT1K2j+pVNij+A@mail.gmail.com>
Date: Mon, 3 Oct 2022 08:33:13 +0100
From: Fuad Tabba <tabba@...gle.com>
To: "Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>
Cc: Chao Peng <chao.p.peng@...ux.intel.com>, kvm@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
linux-fsdevel@...r.kernel.org, linux-api@...r.kernel.org,
linux-doc@...r.kernel.org, qemu-devel@...gnu.org,
Paolo Bonzini <pbonzini@...hat.com>,
Jonathan Corbet <corbet@....net>,
Sean Christopherson <seanjc@...gle.com>,
Vitaly Kuznetsov <vkuznets@...hat.com>,
Wanpeng Li <wanpengli@...cent.com>,
Jim Mattson <jmattson@...gle.com>,
Joerg Roedel <joro@...tes.org>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
x86@...nel.org, "H . Peter Anvin" <hpa@...or.com>,
Hugh Dickins <hughd@...gle.com>,
Jeff Layton <jlayton@...nel.org>,
"J . Bruce Fields" <bfields@...ldses.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Shuah Khan <shuah@...nel.org>, Mike Rapoport <rppt@...nel.org>,
Steven Price <steven.price@....com>,
"Maciej S . Szmigiero" <mail@...iej.szmigiero.name>,
Vlastimil Babka <vbabka@...e.cz>,
Vishal Annapurve <vannapurve@...gle.com>,
Yu Zhang <yu.c.zhang@...ux.intel.com>, luto@...nel.org,
jun.nakajima@...el.com, dave.hansen@...el.com, ak@...ux.intel.com,
david@...hat.com, aarcange@...hat.com, ddutile@...hat.com,
dhildenb@...hat.com, Quentin Perret <qperret@...gle.com>,
Michael Roth <michael.roth@....com>, mhocko@...e.com,
Muchun Song <songmuchun@...edance.com>, wei.w.wang@...el.com
Subject: Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd
Hi
On Fri, Sep 30, 2022 at 5:23 PM Kirill A . Shutemov
<kirill.shutemov@...ux.intel.com> wrote:
>
> On Fri, Sep 30, 2022 at 05:14:00PM +0100, Fuad Tabba wrote:
> > Hi,
> >
> > <...>
> >
> > > diff --git a/mm/memfd_inaccessible.c b/mm/memfd_inaccessible.c
> > > new file mode 100644
> > > index 000000000000..2d33cbdd9282
> > > --- /dev/null
> > > +++ b/mm/memfd_inaccessible.c
> > > @@ -0,0 +1,219 @@
> > > +// SPDX-License-Identifier: GPL-2.0
> > > +#include "linux/sbitmap.h"
> > > +#include <linux/memfd.h>
> > > +#include <linux/pagemap.h>
> > > +#include <linux/pseudo_fs.h>
> > > +#include <linux/shmem_fs.h>
> > > +#include <uapi/linux/falloc.h>
> > > +#include <uapi/linux/magic.h>
> > > +
> > > +struct inaccessible_data {
> > > + struct mutex lock;
> > > + struct file *memfd;
> > > + struct list_head notifiers;
> > > +};
> > > +
> > > +static void inaccessible_notifier_invalidate(struct inaccessible_data *data,
> > > + pgoff_t start, pgoff_t end)
> > > +{
> > > + struct inaccessible_notifier *notifier;
> > > +
> > > + mutex_lock(&data->lock);
> > > + list_for_each_entry(notifier, &data->notifiers, list) {
> > > + notifier->ops->invalidate(notifier, start, end);
> > > + }
> > > + mutex_unlock(&data->lock);
> > > +}
> > > +
> > > +static int inaccessible_release(struct inode *inode, struct file *file)
> > > +{
> > > + struct inaccessible_data *data = inode->i_mapping->private_data;
> > > +
> > > + fput(data->memfd);
> > > + kfree(data);
> > > + return 0;
> > > +}
> > > +
> > > +static long inaccessible_fallocate(struct file *file, int mode,
> > > + loff_t offset, loff_t len)
> > > +{
> > > + struct inaccessible_data *data = file->f_mapping->private_data;
> > > + struct file *memfd = data->memfd;
> > > + int ret;
> > > +
> > > + if (mode & FALLOC_FL_PUNCH_HOLE) {
> > > + if (!PAGE_ALIGNED(offset) || !PAGE_ALIGNED(len))
> > > + return -EINVAL;
> > > + }
> > > +
> > > + ret = memfd->f_op->fallocate(memfd, mode, offset, len);
> >
> > I think that shmem_file_operations.fallocate is only set if
> > CONFIG_TMPFS is enabled (shmem.c). Should there be a check at
> > initialization that fallocate is set, or maybe a config dependency, or
> > can we count on it always being enabled?
>
> It is already there:
>
> config MEMFD_CREATE
> def_bool TMPFS || HUGETLBFS
>
> And we reject inaccessible memfd_create() for HUGETLBFS.
>
> But if we go with a separate syscall, yes, we need the dependency.
I missed that, thanks.
>
> > > + inaccessible_notifier_invalidate(data, offset, offset + len);
> > > + return ret;
> > > +}
> > > +
> >
> > <...>
> >
> > > +void inaccessible_register_notifier(struct file *file,
> > > + struct inaccessible_notifier *notifier)
> > > +{
> > > + struct inaccessible_data *data = file->f_mapping->private_data;
> > > +
> > > + mutex_lock(&data->lock);
> > > + list_add(¬ifier->list, &data->notifiers);
> > > + mutex_unlock(&data->lock);
> > > +}
> > > +EXPORT_SYMBOL_GPL(inaccessible_register_notifier);
> >
> > If the memfd wasn't marked as inaccessible, or more generally
> > speaking, if the file isn't a memfd_inaccessible file, this ends up
> > accessing an uninitialized pointer for the notifier list. Should there
> > be a check for that here, and have this function return an error if
> > that's not the case?
>
> I think it is "don't do that" category. inaccessible_register_notifier()
> caller has to know what file it operates on, no?
The thing is, you could oops the kernel from userspace. For that, all
you have to do is a memfd_create without the MFD_INACCESSIBLE,
followed by a KVM_SET_USER_MEMORY_REGION using that as the private_fd.
I ran into this using my port of this patch series to arm64.
Cheers,
/fuad
> --
> Kiryl Shutsemau / Kirill A. Shutemov
Powered by blists - more mailing lists