lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 29 Sep 2010 14:33:45 +1000
From:	Benjamin Herrenschmidt <benh@...nel.crashing.org>
To:	Nick Piggin <npiggin@...nel.dk>,
	Trond Myklebust <trond.myklebust@....uio.no>
Cc:	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Al Viro <viro@...IV.linux.org.uk>,
	linux-fsdevel@...r.kernel.org
Subject: Odd NFS related SIGBUS (& possible fix)

Hi Nick, Trond !

I've been tracking a problem on a heavily SMP machine here where running
LTP "mmapstress01" spawning 64 CPUs with /tmp over NFS causes some of
the tests to sigbus.

The test itself is a relatively boring mmap+fork hammering test.

What I've tracked down so far is that it seems to SIGBUS due to the
statement in nfs_vm_page_mkwrite()

	mapping = page->mapping;
	if (mapping != dentry->d_inode->i_mapping)
		goto out_unlock;

Which will then hit

	return VM_FAULT_SIGBUS;

Now, while I understand the validity of that test if the mapping indeed
-changed-, in the case I'm hitting it's been merely invalidated.

IE. page->mapping is NULL, as a result of something in NFS deciding to
go through one of the gazillion code path that invalidate mappings (in
this case, an mtime change on the server.

Now, I think -this- root cause is bogus and will need some separate
debugging, but regardless, I don't see why at this stage, page_mkwrite()
should cause a SIGBUS if the file has changed on the server, since we
have pushed our our dirty mappings afaik, and so all that tells is is
that we raced with the cache invalidation while the struct page wasn't
locked.

So I'm wondering if the right solution shouldn't be to replay the fault
in that case instead.

Now, I initially thought about returning 0; and hitting the following
code path in __do_fault() but...

				if (unlikely(!(tmp & VM_FAULT_LOCKED))) {
					lock_page(page);
					if (!page->mapping) {
						ret = 0; /* retry the fault */
						unlock_page(page);
						goto unwritable_page;
					}

 ... I'm not too happy about it and I'll need Nick insight here. The thing
is that to hit there, I need to unlock the page first. That means page->mapping
can change, and thus no longer be NULL by the time we get there, in which case
it doesn't sound right at all to move on and make the page writable, which
the code would do. Or am I missing something ?

So my preferred fix, if I'm indeed right and this is a real bug, would be
to do something in nfs_vm_page_mkwrite() along the lines of:

 	lock_page(page);
 	mapping = page->mapping;
-	if (mapping != dentry->d_inode->i_mapping)
+ 	if (mapping != dentry->d_inode->i_mapping) {
+		if (!mapping)
+			ret = 0;
 		goto out_unlock;
+	}

Or am I missing something ?

Now regarding the other bug, unless Trond has an idea already, I think I'll start
a separate email thread once I've collected more data. I -think- it invalidates it
because it sees a the server mtime that is more recent than the inode, but the
server shouldn't be touching at files, so I suspect we get confused somewhere in
the kernel and I don't know why yet (the code path inside NFS aren't obvious to
me at this stage).

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ