linux-kernel - Re: [FYI] tux3: Core changes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150810124525.GC3768@quack.suse.cz>
Date:	Mon, 10 Aug 2015 14:45:25 +0200
From:	Jan Kara <jack@...e.cz>
To:	OGAWA Hirofumi <hirofumi@...l.parknet.co.jp>
Cc:	Jan Kara <jack@...e.cz>, Daniel Phillips <daniel@...nq.net>,
	David Lang <david@...g.hm>, Rik van Riel <riel@...hat.com>,
	tux3@...3.org, linux-kernel@...r.kernel.org,
	linux-fsdevel@...r.kernel.org
Subject: Re: [FYI] tux3: Core changes

On Sun 09-08-15 22:42:42, OGAWA Hirofumi wrote:
> Jan Kara <jack@...e.cz> writes:
> 
> > I'm not sure about which ENOSPC issue you are speaking BTW. Can you
> > please ellaborate?
> 
> 1. GUP simulate page fault, and prepare to modify
> 2. writeback clear dirty, and make PTE read-only
> 3. snapshot/reflink make block cow

I assume by point 3. you mean that snapshot / reflink happens now and thus
the page / block is marked as COW. Am I right?

> 4. driver called GUP modifies page, and dirty page without simulate page fault

OK, but this doesn't hit ENOSPC because as you correctly write in point 4.,
the page gets modified without triggering another page fault so COW for the
modified page isn't triggered. Modified page contents will be in both the
original and the reflinked file, won't it?

And I agree that the fact that snapshotted file's original contents can
still get modified is a bug. A one which is difficult to fix.

> >> If you claim, there is strange logic widely used already, and of course,
> >> we can't simply break it because of compatibility. I would be able to
> >> agree. But your claim sounds like that logic is sane and well designed
> >> behavior. So I disagree.
> >
> > To me the rule: "Do not detach a page from a radix tree if it has an elevated
> > refcount unless explicitely requested by a syscall" looks like a sane one.
> > Yes.
> >
> >> > And frankly I fail to see why you and Daniel care so much about this
> >> > corner case because from performance POV it's IMHO a non-issue and you
> >> > bother with page forking because of performance, don't you?
> >> 
> >> Trying to penalize the corner case path, instead of normal path, should
> >> try at first. Penalizing normal path to allow corner case path is insane
> >> basically.
> >>
> >> Make normal path faster and more reliable is what we are trying.
> >
> > Elevated refcount of a page is in my opinion a corner case path. That's why
> > I think that penalizing that case by waiting for IO instead of forking is
> > acceptable cost for the improved compatibility & maintainability of the
> > code.
> 
> What is "elevated refcount"? What is difference with normal refcount?
> Are you saying "refcount >= specified threshold + waitq/wakeup" or
> such? If so, it is not the path.  It is the state. IOW, some group may
> not hit much, but some group may hit much, on normal path.

Yes, by "elevated refcount" I meant refcount > 2 (one for pagecache, one for
your code inspecting the page).

> So it sounds like yet another "stable page". I.e. unpredictable
> performance. (BTW, by recall of "stable page", noticed "stable page"
> would not provide stabled page data for that logic too.)
> 
> Well, assuming "elevated refcount == threshold + waitq/wakeup", so
> IMO, it is not attractive.  Rather the last option if there is no
> others as design choice.

I agree the performance will be less predictable and that is not good. But
changing what is visible in the file when writeback races with GUP is a
worse problem to me.

Maybe if GUP marked pages it got ref for so that we could trigger the slow
behavior only for them (Peter Zijlstra proposed in [1] an infrastructure so
that pages pinned by get_user_pages() would be properly accounted and then
we could use PG_mlocked and elevated refcount as a more reliable indication
of pages that need special handling).

								Honza

[1] http://thread.gmane.org/gmane.linux.kernel.mm/117679
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/