[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230320190836.z2rqrhybke3egiuu@amd.com>
Date: Mon, 20 Mar 2023 14:08:36 -0500
From: Michael Roth <michael.roth@....com>
To: "Nikunj A. Dadhania" <nikunj@....com>
CC: Chao Peng <chao.p.peng@...ux.intel.com>, <kvm@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, <linux-mm@...ck.org>,
<linux-fsdevel@...r.kernel.org>, <linux-arch@...r.kernel.org>,
<linux-api@...r.kernel.org>, <linux-doc@...r.kernel.org>,
<qemu-devel@...gnu.org>, Paolo Bonzini <pbonzini@...hat.com>,
Jonathan Corbet <corbet@....net>,
Sean Christopherson <seanjc@...gle.com>,
Vitaly Kuznetsov <vkuznets@...hat.com>,
Wanpeng Li <wanpengli@...cent.com>,
Jim Mattson <jmattson@...gle.com>,
Joerg Roedel <joro@...tes.org>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
Arnd Bergmann <arnd@...db.de>,
Naoya Horiguchi <naoya.horiguchi@....com>,
Miaohe Lin <linmiaohe@...wei.com>, <x86@...nel.org>,
"H . Peter Anvin" <hpa@...or.com>, Hugh Dickins <hughd@...gle.com>,
Jeff Layton <jlayton@...nel.org>,
"J . Bruce Fields" <bfields@...ldses.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Shuah Khan <shuah@...nel.org>, Mike Rapoport <rppt@...nel.org>,
Steven Price <steven.price@....com>,
"Maciej S . Szmigiero" <mail@...iej.szmigiero.name>,
Vlastimil Babka <vbabka@...e.cz>,
Vishal Annapurve <vannapurve@...gle.com>,
Yu Zhang <yu.c.zhang@...ux.intel.com>,
"Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
<luto@...nel.org>, <jun.nakajima@...el.com>,
<dave.hansen@...el.com>, <ak@...ux.intel.com>, <david@...hat.com>,
<aarcange@...hat.com>, <ddutile@...hat.com>, <dhildenb@...hat.com>,
Quentin Perret <qperret@...gle.com>, <tabba@...gle.com>,
<mhocko@...e.com>, <wei.w.wang@...el.com>
Subject: Re: [PATCH v10 1/9] mm: Introduce memfd_restricted system call to
create restricted user memory
On Thu, Feb 16, 2023 at 03:21:21PM +0530, Nikunj A. Dadhania wrote:
>
> > +static struct file *restrictedmem_file_create(struct file *memfd)
> > +{
> > + struct restrictedmem_data *data;
> > + struct address_space *mapping;
> > + struct inode *inode;
> > + struct file *file;
> > +
> > + data = kzalloc(sizeof(*data), GFP_KERNEL);
> > + if (!data)
> > + return ERR_PTR(-ENOMEM);
> > +
> > + data->memfd = memfd;
> > + mutex_init(&data->lock);
> > + INIT_LIST_HEAD(&data->notifiers);
> > +
> > + inode = alloc_anon_inode(restrictedmem_mnt->mnt_sb);
> > + if (IS_ERR(inode)) {
> > + kfree(data);
> > + return ERR_CAST(inode);
> > + }
>
> alloc_anon_inode() uses new_pseudo_inode() to get the inode. As per the comment, new inode
> is not added to the superblock s_inodes list.
Another issue somewhat related to alloc_anon_inode() is that the shmem code
in some cases assumes the inode struct was allocated via shmem_alloc_inode(),
which allocates a struct shmem_inode_info, which is a superset of struct inode
with additional fields for things like spinlocks.
These additional fields don't get allocated/ininitialized in the case of
restrictedmem, so when restrictedmem_getattr() tries to pass the inode on to
shmem handler, it can cause a crash.
For instance, the following trace was seen when executing 'sudo lsof' while a
process/guest was running with an open memfd FD:
[24393.121409] general protection fault, probably for non-canonical address 0xfe9fb182fea3f077: 0000 [#1] PREEMPT SMP NOPTI
[24393.133546] CPU: 2 PID: 590073 Comm: lsof Tainted: G E 6.1.0-rc4-upm10b-host-snp-v8b+ #4
[24393.144125] Hardware name: AMD Corporation ETHANOL_X/ETHANOL_X, BIOS RXM1009B 05/14/2022
[24393.153150] RIP: 0010:native_queued_spin_lock_slowpath+0x3a3/0x3e0
[24393.160049] Code: f3 90 41 8b 04 24 85 c0 74 ea eb f4 c1 ea 12 83 e0 03 83 ea 01 48 c1 e0 05 48 63 d2 48 05 00 41 04 00 48 03 04 d5 e0 ea 8b 82 <48> 89 18 8b 43 08 85 c0 75 09 f3 90 8b 43 08 85 c0 74 f7 48 8b 13
[24393.181004] RSP: 0018:ffffc9006b6a3cf8 EFLAGS: 00010086
[24393.186832] RAX: fe9fb182fea3f077 RBX: ffff889fcc144100 RCX: 0000000000000000
[24393.194793] RDX: 0000000000003ffe RSI: ffffffff827acde9 RDI: ffffc9006b6a3cdf
[24393.202751] RBP: ffffc9006b6a3d20 R08: 0000000000000001 R09: 0000000000000000
[24393.210710] R10: 0000000000000000 R11: 000000000000ffff R12: ffff888179fa50e0
[24393.218670] R13: ffff889fcc144100 R14: 00000000000c0000 R15: 00000000000c0000
[24393.226629] FS: 00007f9440f45400(0000) GS:ffff889fcc100000(0000) knlGS:0000000000000000
[24393.235692] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[24393.242101] CR2: 000055c55a9cf088 CR3: 0008000220e9c003 CR4: 0000000000770ee0
[24393.250059] PKRU: 55555554
[24393.253073] Call Trace:
[24393.255797] <TASK>
[24393.258133] do_raw_spin_lock+0xc4/0xd0
[24393.262410] _raw_spin_lock_irq+0x50/0x70
[24393.266880] ? shmem_getattr+0x4c/0xf0
[24393.271060] shmem_getattr+0x4c/0xf0
[24393.275044] restrictedmem_getattr+0x34/0x40
[24393.279805] vfs_getattr_nosec+0xbd/0xe0
[24393.284178] vfs_getattr+0x37/0x50
[24393.287971] vfs_statx+0xa0/0x150
[24393.291668] vfs_fstatat+0x59/0x80
[24393.295462] __do_sys_newstat+0x35/0x70
[24393.299739] __x64_sys_newstat+0x16/0x20
[24393.304111] do_syscall_64+0x3b/0x90
[24393.308098] entry_SYSCALL_64_after_hwframe+0x63/0xcd
As a workaround we've been doing the following, but it's probably not the
proper fix:
https://github.com/AMDESE/linux/commit/0378116b5c4e373295c9101727f2cb5112d6b1f4
-Mike
Powered by blists - more mailing lists