lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20110712162431.75bfe77b.akpm@linux-foundation.org>
Date:	Tue, 12 Jul 2011 16:24:31 -0700
From:	Andrew Morton <akpm@...ux-foundation.org>
To:	Hugh Dickins <hughd@...gle.com>
Cc:	linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH 1/12] radix_tree: exceptional entries and indices

<tries to remember what this is all about>

l 2011 15:56:14 -0700 (PDT)
Hugh Dickins <hughd@...gle.com> wrote:

> On Sat, 18 Jun 2011, Andrew Morton wrote:
> > On Fri, 17 Jun 2011 17:13:38 -0700 (PDT) Hugh Dickins <hughd@...gle.com> wrote:
> > > On Fri, 17 Jun 2011, Andrew Morton wrote:
> > > > On Tue, 14 Jun 2011 03:42:27 -0700 (PDT)
> > > > Hugh Dickins <hughd@...gle.com> wrote:
> > > > 
> > > > > The low bit of a radix_tree entry is already used to denote an indirect
> > > > > pointer, for internal use, and the unlikely radix_tree_deref_retry() case.
> > > > > Define the next bit as denoting an exceptional entry, and supply inline
> > > > > functions radix_tree_exception() to return non-0 in either unlikely case,
> > > > > and radix_tree_exceptional_entry() to return non-0 in the second case.
> > > > 
> > > > Yes, the RADIX_TREE_INDIRECT_PTR hack is internal-use-only, and doesn't
> > > > operate on (and hence doesn't corrupt) client-provided items.
> > > > 
> > > > This patch uses bit 1 and uses it against client items, so for
> > > > practical purpoese it can only be used when the client is storing
> > > > addresses.  And it needs new APIs to access that flag.
> > > > 
> > > > All a bit ugly.  Why not just add another tag for this?  Or reuse an
> > > > existing tag if the current tags aren't all used for these types of
> > > > pages?
> > > 
> > > I couldn't see how to use tags without losing the "lockless" lookups:
> > 
> > So lockless pagecache broke the radix-tree tag-versus-item coherency as
> > well as the address_space nrpages-vs-radix-tree coherency.
> 
> I don't think that remark is fair to lockless pagecache at all.  If we
> want the scalability advantage of lockless lookup, yes, we don't have
> strict coherency with tagging at that time.  But those places that need
> to worry about that coherency, can lock to do so.

Nobody thought about these issues, afaik.  Things have broken and the
code has become significantly more complex/fragile.

Does the locking in mapping_tagged() make any sense?

> > Isn't it fun learning these things.
> > 
> > > because the tag is a separate bit from the entry itself, unless you're
> > > under tree_lock, there would be races when changing from page pointer
> > > to swap entry or back, when slot was updated but tag not or vice versa.
> > 
> > So...  take tree_lock?
> 
> I wouldn't call that an improvement...

I wouldn't call the proposed changes to radix-tree.c an improvement,
either.  It's an expedient, once-off, single-caller hack.

If the cost of adding locking is negligible then that is a superior fix.

> > What effect does that have?
> 
> ... but admit I have not measured: I rather assume that if we now change
> tmpfs from lockless to locked lookup, someone else will soon come up with
> the regression numbers.
> 
> > It'd better be
> > "really bad", because this patchset does nothing at all to improve core
> > MM maintainability :(
> 
> I was aiming to improve shmem.c maintainability; and you have good grounds
> to accuse me of hurting shmem.c maintainability when I highmem-ized the
> swap vector nine years ago.
> 
> I was not aiming to improve core MM maintainability, nor to harm it.
> I am extending the use to which the radix-tree can be put, but is that
> so bad?

I find it hard to believe that this wart added to the side of the
radix-tree code will find any other users.  And the wart spreads
contagion into core filemap pagecache lookup.

It's pretty nasty stuff.  Please, what is a better way of doing all this?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ