lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20161027084803.GA19743@quack2.suse.cz>
Date:   Thu, 27 Oct 2016 10:48:03 +0200
From:   Jan Kara <jack@...e.cz>
To:     Johannes Weiner <hannes@...xchg.org>
Cc:     Jan Kara <jack@...e.cz>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Dave Jones <davej@...emonkey.org.uk>,
        linux-mm <linux-mm@...ck.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        kernel-team <kernel-team@...com>
Subject: Re: [PATCH 0/5] mm: workingset: radix tree subtleties & single-page
 file refaults

On Wed 26-10-16 14:15:09, Johannes Weiner wrote:
> On Wed, Oct 26, 2016 at 11:21:07AM +0200, Jan Kara wrote:
> > On Mon 24-10-16 14:47:39, Johannes Weiner wrote:
> > > From: Johannes Weiner <hannes@...xchg.org>
> > > Date: Mon, 17 Oct 2016 09:00:04 -0400
> > > Subject: [PATCH] mm: workingset: restore single-page file refault tracking
> > > 
> > > Currently, we account shadow entries in the page cache in the upper
> > > bits of the radix_tree_node->count, behind the back of the radix tree
> > > implementation. Because the radix tree code has no awareness of them,
> > > we have to prevent shadow entries from going through operations where
> > > the tree implementation relies on or modifies node->count: extending
> > > and shrinking the tree from and to a single direct root->rnode entry.
> > > 
> > > As a consequence, we cannot store shadow entries for files that only
> > > have index 0 populated, and thus cannot detect refaults from them,
> > > which in turn degrades the thrashing compensation in LRU reclaim.
> > > 
> > > Another consequence is that we rely on subtleties throughout the radix
> > > tree code, such as the node->count != 1 check in the shrinking code,
> > > which is meant to exclude multi-entry nodes but also skips nodes with
> > > only one shadow entry since they are accounted in the upper bits. This
> > > is error prone, and has in fact caused the bug fixed in d3798ae8c6f3
> > > ("mm: filemap: don't plant shadow entries without radix tree node").
> > > 
> > > To fix this, this patch moves the shadow counter from the upper bits
> > > of node->count into the new node->exceptional counter, where all
> > > exceptional entries are explicitely tracked by the radix tree.
> > > node->count then counts all tree entries again, including shadows.
> > > 
> > > Switching from a magic node->count to accounting exceptional entries
> > > natively in the radix tree code removes the fragile subtleties
> > > mentioned above. It also allows us to store shadow entries for
> > > single-page files again, as the radix tree recognizes exceptional
> > > entries when extending the tree from the root->rnode singleton, and
> > > thus restore refault detection and thrashing compensation for them.
> > 
> > I like this solution.
> 
> Thanks Jan.
> 
> > Just one suggestion: I think radix_tree_replace_slot() can now do
> > the node counter update on its own and that would save us having to
> > do quite a bit of accounting outside of the radix tree code itself
> > and it would be less prone to bugs (forgotten updates of a
> > counter). What do you think?
> 
> This would be nice indeed, but it's bigger surgery. We need the node
> in the context of existing users that do slot lookup and replacement,
> which is easier for individual lookups, and harder for gang lookups
> (e.g. drivers/sh/intc/virq.c::intc_subgroup_map). And they'd all get
> more complicated, AFAICS, without even using exceptional entries.

Hum, I agree. But actually looking at e.g. the usage of
radix_tree_replace_slot() in mm/khugepaged.c I think this is even buggy
when replacing a slot referencing a page with NULL without updating
node->count. It is in the error bail-out path so I'm not surprised we did
not stumble over it.

So a relatively easy solution looks like: Create new function like
radix_tree_replace_node_slot() taking both node & slot and updating the
accounting information as needed. Make this function WARN if node is NULL
and accounting information should be updated. Make original
radix_tree_replace_slot() a wrapper around radix_tree_replace_node_slot()
passing NULL as node.

This should provide safe accounting info updates without forcing all users
to work with node pointers...

								Honza
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ