linux-kernel - Re: [PATCH 2/3] NFS: lock the readdir page while it is in use

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <AANLkTi=p8mF2=G8TPi7TciEYS8cyDNp8OG+ksrF7b-vf@mail.gmail.com>
Date:	Tue, 30 Nov 2010 21:06:00 -0800
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Trond Myklebust <Trond.Myklebust@...app.com>
Cc:	Nick Bowler <nbowler@...iptictech.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	linux-nfs@...r.kernel.org
Subject: Re: [PATCH 2/3] NFS: lock the readdir page while it is in use

On Tue, Nov 30, 2010 at 8:29 PM, Trond Myklebust
<Trond.Myklebust@...app.com> wrote:
>
> I'm not worried about other readdir calls invalidating the page. My
> concern is rather about the VM memory reclaimers ejecting the page from
> the page cache, and calling nfs_readdir_clear_array while we're
> referencing the page.

I think you're making a fundamental mistake here, and you're confused
by a much deeper problem.

The thing is, the ".releasepage" callback gets called _before_ the
page actually gets removed from the page cache, and there is no
guarantee that it will always be removed at all!

In fact, anybody holding a reference to it will result in
page_freeze_refs() not successfully clearing all the refs, and that in
turn will abort the actual freeing of the page. So while you hold the
page count, your page will NOT be freed. Guaranteed.

But it is true that the ".releasepage()" function may be called. So if
your NFS release callback ends up invalidating the data on that page,
that page lock thing will make a difference, yes.

But at the same time, are you sure that you are able to then handle
the case of that page still existing in the page cache and being used
afterwards? Looking at the code, it doesn't look that way to me.

So I think you're confused, and the NFS code totally incorrectly
thinks that ".releasepage" is something that happens at the last use
of the page. It simply is not so. In fact, you seem to return 0, which
I think means "failure to release", so the VM will just mark it busy
again afterwards.

Now, I think you do have a few options:

 - keep the current model. BUT! In the page cache release function
(nfs_readdir_clear_array), make sure that you also clear the
up-to-date bit, so that the page gets read back in since it no longer
contains any valid information. And return success for the
"releasepage" operatioin.

Alternatively:

 - introduce a callback for the case of the page actually being gone
from the page cache, which gets called _after_ the removal.

which seems to be what you really want, since for you the releasepage
thing is about releasing the data structures associated with that
cache. So you don't want to worry about the page lock, and you don't
want to worry about the case of "maybe it won't get released at all
after this because somebody still holds a ref-count".

> As far as I can see, the only way to protect against that is to lock the
> page, perform the usual tests and then release the page lock when we're
> done...

I think one of us is confused. And it's possible that it is me. It's
been a _long_ time since I looked at that code, and I may well be
missing something. But I get the strong feeling you're mis-using
'.releasepage' and confused about the semantics.

                 Linus "maybe confused myself" Torvalds
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/