lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1436440655.2709.8.camel@pluto.fritz.box>
Date:	Thu, 09 Jul 2015 19:17:35 +0800
From:	Ian Kent <raven@...maw.net>
To:	Al Viro <viro@...IV.linux.org.uk>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	"J. Bruce Fields" <bfields@...ldses.org>,
	linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org
Subject: Re: [RFC] freeing unliked file indefinitely delayed

On Wed, 2015-07-08 at 02:42 +0100, Al Viro wrote:
> 	Normally opening a file, unlinking it and then closing will have
> the inode freed upon close() (provided that it's not otherwise busy and
> has no remaining links, of course).  However, there's one case where that
> does *not* happen.  Namely, if you open it by fhandle with cold dcache,
> then unlink() and close().
> 
> 	In normal case you get d_delete() in unlink(2) notice that dentry
> is busy and unhash it; on the final dput() it will be forcibly evicted from
> dcache, triggering iput() and inode removal.  In this case, though, we end
> up with *two* dentries - disconnected (created by open-by-fhandle) and
> regular one (used by unlink()).  The latter will have its reference to inode
> dropped just fine, but the former will not - it's considered hashed (it
> is on the ->s_anon list), so it will stay around until the memory pressure
> will finally do it in.  As the result, we have the final iput() delayed
> indefinitely.  It's trivial to reproduce -
> 
> #define _GNU_SOURCE
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> 
> void flush_dcache(void)
> {
>         system("mount -o remount,rw /");
> }
> 
> static char buf[20 * 1024 * 1024];
> 
> main()
> {
>         int fd;
>         union { 
>                 struct file_handle f;
>                 char buf[MAX_HANDLE_SZ];
>         } x;
>         int m;
> 
>         x.f.handle_bytes = sizeof(x);
>         chdir("/root");
>         mkdir("foo", 0700);
>         fd = open("foo/bar", O_CREAT | O_RDWR, 0600);
>         close(fd);
>         name_to_handle_at(AT_FDCWD, "foo/bar", &x.f, &m, 0);
>         flush_dcache();
>         fd = open_by_handle_at(AT_FDCWD, &x.f, O_RDWR);
>         unlink("foo/bar");
>         write(fd, buf, sizeof(buf));
>         system("df .");			/* 20Mb eaten */
>         close(fd);
>         system("df .");			/* should've freed those 20Mb */
>         flush_dcache();
>         system("df .");			/* should be the same as #2 */
> }
> 
> will spit out something like
> Filesystem     1K-blocks   Used Available Use% Mounted on
> /dev/root         322023 303843      1131 100% /
> Filesystem     1K-blocks   Used Available Use% Mounted on
> /dev/root         322023 303843      1131 100% /
> Filesystem     1K-blocks   Used Available Use% Mounted on
> /dev/root         322023 283282     21692  93% /
> - inode gets freed only when dentry is finally evicted (here we trigger
> than by remount; normally it would've happened in response to memory
> pressure hell knows when).
> 
> IMO it's a bug.  Between the close() and final flush_dcache() the file has
> no surviving links, is *not* busy, won't show up in fuser/lsof/whatnot
> output, and yet it's still not freed.  I'm not saying that it's realistically
> exploitable (albeit with nfsd it might be), but it's a very unpleasant
> self-LART.
> 
> FWIW, my prefered fix would be simply to have the final dput() treat
> disconnected dentries as "kill on sight"; checking for i_nlink of the
> inode, as Bruce suggested several years ago, will *not* work, simply
> because having another link to that file and unlinking it after close
> will reproduce the situation - disconnected dentry sticks around in
> dcache past its final dput() and past the last unlink() of our file.
> Theoretically it might cause an overhead for nfsd (no_subtree_check v3
> export might see more d_alloc()/d_free(); icache retention will still
> prevent constant rereading the inode from disk).  _IF_ that proves to
> be noticable, we might need to come up with something more elaborate
> (e.g. have unlink() and rename() kick disconnected aliases out if the link
> count has reached 0), but it's more complex and needs careful ananlysis
> of correctness - we need to prove that there's no way to miss the link
> count reaching 0.  I would prefer to treat all disconnected as unhashed
> for dcache retention purposes - it's simpler and less brittle.  Comments?
> I mean something like this:

Al, help me out here, I'm struggling to understand where these dentrys
come from (apart from your reproducer).

For example, on the heavily patched 2.6.32 kernel where this was first
seen the problem dentry is annoymous, refcount 0, and unhashed.

But the dentrys that will most likely face summary execution will be
hashed, such as was the case on that 2.6.32 kernel at dput().

Doesn't that mean that something dropped the dentry after the dput(),
that will now also free the dentry, that took the refcount to 0?

Ian

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ