linux-ext4 - Confused by commit 56a76f [fs: fix page_mkwrite error cases in core code and btrfs]

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20101230232936.GA931@mail.oracle.com>
Date:	Thu, 30 Dec 2010 15:29:36 -0800
From:	Joel Becker <Joel.Becker@...cle.com>
To:	Nick Piggin <npiggin@...e.de>
Cc:	linux-fsdevel@...r.kernel.org, linux-ext4@...r.kernel.org,
	ocfs2-devel@....oracle.com, wengang@...server1.us.oracle.com
Subject: Confused by commit 56a76f [fs: fix page_mkwrite error cases in
 core code and btrfs]

Nick,
	While visiting some issues in ocfs2_page_mkwrite(), I realized
that we're returning 0 to mean "Please try again Mr VM", but the caller
of page_mkwrite() now expects VM_FAULT_NOPAGE.  This is all due to
56a76f [fs: fix page_mkwrite error cases in core code and btrfs].
	In the comment for 56a76f, you state that btrfs is the example
but all other filesystems need to be fixed...yet ocfs2, ext4, and gfs2
continue to return 0 or VM_FAULT_SIGBUS.
	I guess this continues to work because we return the page
unlocked (thus triggering the "if (unlikely(!(tmp & VM_FAULT_LOCKED)))"
logic).  Was this expected to continue to work, or do you consider these
filesystems broken until they are updated with the new return codes?
	Back to the ocfs2 issue.  Am I correctly reading the current
code that we can safely throw away the page passed in to page_mkwrite()
if a pagecache flush has dropped it?  Currently, we presume that the
page passed in must be the one we make writable.  We make a quick check
of page->mapping == inode->i_mapping, returning 0 (for "try again")
immediately if that's false.  But once we get into the meat of our
locking and finally lock the page for good, we assume mapping==i_mapping
still holds.  That obviously breaks when the pagecache gets truncated.
At this late stage, we -EINVAL (clearly wrong).
	It looks hard to lock the page for good high up at our first
check of the mapping.  Wengang has proposed to simply ignore the page
passed into page_mkwrite() and just find_or_create_page() the sucker at
the place we're ready to consider it stable.  As I see it, the two
places that call page_mkwrite() are going to revalidate the pte anyway,
and they'll find the newly created page if find_or_create_page() gets a
different.  Is that safe behavior?

Joel

-- 

Life's Little Instruction Book #3

	"Watch a sunrise at least once a year."

Joel Becker
Senior Development Manager
Oracle
E-mail: joel.becker@...cle.com
Phone: (650) 506-8127
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html