[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190402191702.akrg64fqtqrnu7tr@cs.cmu.edu>
Date: Tue, 2 Apr 2019 15:17:03 -0400
From: Jan Harkes <jaharkes@...cmu.edu>
To: Waiman Long <longman@...hat.com>
Cc: Ingo Molnar <mingo@...nel.org>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Alexander Viro <viro@...iv.linux.org.uk>,
Pedro Cuadra Chamorro <pcuadrac@...cmu.edu>,
linux-kernel@...r.kernel.org
Subject: Re: fs/coda oops bisected to (925b9cd1b8) "locking/rwsem: Make owner
store task pointer of last owni
On Sun, Mar 31, 2019 at 03:13:47PM -0400, Jan Harkes wrote:
> On Sun, Mar 31, 2019 at 02:14:13PM -0400, Waiman Long wrote:
> > One possibility is that there is a previous reference to the memory
> > currently occupied by the spinlock. If the memory location is previously
> > part of a rwsem structure and someone is still using it, you may get
> > memory corruption.
>
> Ah, I hadn't even thought of that possibility. Good, it will open up
First of all, I have to thank you for your original patch because
otherwise I probably would never have discovered that something was
seriously wrong. Your patch made the problem visible.
I ended up changing 'owner' to '_RET_IP_' and dumping the value of the
clobbered coda inode spinlock and surrounding memory and found that the
'culprit' is in ext4_filemap_fault and despite it being in ext4, it is
still a Coda specific problem.
Effectively Coda overlays other filesystems' inodes for mmap, but
the vma->vm_file still points at Coda's file. So when we use
file_inode() in ext4_filemap_fault we end up with the Coda inode instead
of the ext4 inode and when trying to grab ext4's mmap_sem we really just
scribble over the memory region that happens to contain the Coda inode
spinlock. A fix is to use vm_file->f_mapping->host instead of
file_inode(vm_file).
Of course everyone looks at ext4 as a canonical example so this problem
has spread pretty much everywhere and I'm wondering how to best resolve
this.
- change file_inode() to follow file->f_mapping->host
would fix most places, but maybe f_mapping is not always guaranteed to
point at a usable place?
- change Coda's mmap to replace vma->vm_file with the host file
we'd probably no longer get notified when the last reference to the
host file goes away, so we'd call coda_release and notify userspace on
close() even when there are still active mmap regions.
- fix every in-tree file system to use vma->vm_file->f_mapping->host.
Jan
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 69d65d49837b..122d691d3eda 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -284,7 +284,7 @@ static vm_fault_t ext4_dax_huge_fault(struct vm_fault *vmf,
vm_fault_t result;
int retries = 0;
handle_t *handle = NULL;
- struct inode *inode = file_inode(vmf->vma->vm_file);
+ struct inode *inode = vmf->vma->vm_file->f_mapping->host;
struct super_block *sb = inode->i_sb;
/*
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index b54b261ded36..62a0025ce7f8 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -6211,7 +6211,7 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf)
int err;
vm_fault_t ret;
struct file *file = vma->vm_file;
- struct inode *inode = file_inode(file);
+ struct inode *inode = file->f_mapping->host;
struct address_space *mapping = inode->i_mapping;
handle_t *handle;
get_block_t *get_block;
@@ -6302,7 +6302,7 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf)
vm_fault_t ext4_filemap_fault(struct vm_fault *vmf)
{
- struct inode *inode = file_inode(vmf->vma->vm_file);
+ struct inode *inode = vmf->vma->vm_file->f_mapping->host;
vm_fault_t ret;
down_read(&EXT4_I(inode)->i_mmap_sem);
Powered by blists - more mailing lists