lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20161030173135.GC17039@quack2.suse.cz>
Date:   Sun, 30 Oct 2016 18:31:35 +0100
From:   Jan Kara <jack@...e.cz>
To:     "Kirill A. Shutemov" <kirill@...temov.name>
Cc:     Jan Kara <jack@...e.cz>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        Theodore Ts'o <tytso@....edu>,
        Andreas Dilger <adilger.kernel@...ger.ca>,
        Jan Kara <jack@...e.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Alexander Viro <viro@...iv.linux.org.uk>,
        Hugh Dickins <hughd@...gle.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Dave Hansen <dave.hansen@...el.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Matthew Wilcox <willy@...radead.org>,
        Ross Zwisler <ross.zwisler@...ux.intel.com>,
        linux-ext4@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        linux-block@...r.kernel.org
Subject: Re: [PATCHv3 17/41] filemap: handle huge pages in
 filemap_fdatawait_range()

On Mon 24-10-16 14:36:25, Kirill A. Shutemov wrote:
> On Thu, Oct 13, 2016 at 03:18:02PM +0200, Jan Kara wrote:
> > On Thu 13-10-16 15:08:44, Kirill A. Shutemov wrote:
> > > On Thu, Oct 13, 2016 at 11:44:41AM +0200, Jan Kara wrote:
> > > > On Thu 15-09-16 14:54:59, Kirill A. Shutemov wrote:
> > > > > We writeback whole huge page a time.
> > > > 
> > > > This is one of the things I don't understand. Firstly I didn't see where
> > > > changes of writeback like this would happen (maybe they come later).
> > > > Secondly I'm not sure why e.g. writeback should behave atomically wrt huge
> > > > pages. Is this because radix-tree multiorder entry tracks dirtiness for us
> > > > at that granularity?
> > > 
> > > We track dirty/writeback on per-compound pages: meaning we have one
> > > dirty/writeback flag for whole compound page, not on every individual
> > > 4k subpage. The same story for radix-tree tags.
> > > 
> > > > BTW, can you also explain why do we need multiorder entries? What do
> > > > they solve for us?
> > > 
> > > It helps us having coherent view on tags in radix-tree: no matter which
> > > index we refer from the range huge page covers we will get the same
> > > answer on which tags set.
> > 
> > OK, understand that. But why do we need a coherent view? For which purposes
> > exactly do we care that it is not just a bunch of 4k pages that happen to
> > be physically contiguous and thus can be mapped in one PMD?
> 
> My understanding is that things like PageDirty() should be handled on the
> same granularity as PAGECACHE_TAG_DIRTY, otherwise things can go horribly
> wrong...

Yeah, I agree with that. My question was rather aiming in the direction:
Why don't we keep PageDirty and PAGECACHE_TAG_DIRTY on a page granularity?
Why do we push all this to happen only in the head page?

In your coverletter of the latest version (BTW thanks for expanding
explanations there) you write:
  - head page (the first subpage) on LRU represents whole huge page;
  - head page's flags represent state of whole huge page (with few
    exceptions);
  - mm can't migrate subpages of the compound page individually;

So the fact that flags of a head page represent flags of each individual
page is the decision that I'm questioning, at least for PageDirty and
PageWriteback flags. I'm asking because frankly, I don't like the series
much. IMHO too many places need to know about huge pages and things will
get broken frequently. And from filesystem POV I don't really see why a
filesystem should care about huge pages *at all*. Sure functions allocating
pages into page cache need to care, sure functions mapping pages into page
tables need to care. But nobody else should need to be aware we are playing
some huge page games... At least that is my idea how things ought to work
;)

Your solution seems to go more towards the direction where we have two
different sizes of pages in the system and everyone has to cope with it.
But I'd also note that you go only half way there - e.g. page lookup
functions still work with subpages, some places still use PAGE_SIZE &
page->index, ... - so the result is a strange mix.

So what are the reasons for having pages forming a huge page bound so
tightly?


								Honza
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ