lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LSU.2.11.1902150159100.5680@eggly.anvils>
Date:   Fri, 15 Feb 2019 02:38:40 -0800 (PST)
From:   Hugh Dickins <hughd@...gle.com>
To:     "Darrick J. Wong" <darrick.wong@...cle.com>
cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Matej Kupljen <matej.kupljen@...il.com>,
        linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        linux-mm@...ck.org, hughd@...gle.com
Subject: Re: tmpfs inode leakage when opening file with O_TMP_FILE

On Thu, 14 Feb 2019, Darrick J. Wong wrote:
> [cc the shmem maintainer and the mm list]

Yup, thanks - Matej also did so the day after sending to linux-kernel.

> 
> On Thu, Feb 14, 2019 at 03:44:02PM -0800, Andrew Morton wrote:
> > (cc linux-fsdevel)

Okay, thanks, but a tmpfs peculiarity we think.

> > 
> > On Mon, 11 Feb 2019 15:18:11 +0100 Matej Kupljen <matej.kupljen@...il.com> wrote:
> > 
> > > Hi,
> > > 
> > > it seems that when opening file on file system that is mounted on
> > > tmpfs with the O_TMPFILE flag and using linkat call after that, it
> > > uses 2 inodes instead of 1.
> > > 
> > > This is simple test case:
> > > 
> > > #include <sys/types.h>
> > > #include <sys/stat.h>
> > > #include <fcntl.h>
> > > #include <unistd.h>
> > > #include <string.h>
> > > #include <stdio.h>
> > > #include <stdlib.h>
> > > #include <linux/limits.h>
> > > #include <errno.h>
> > > 
> > > #define TEST_STRING     "Testing\n"
> > > 
> > > #define TMP_PATH        "/tmp/ping/"
> > > #define TMP_FILE        "file.txt"
> > > 
> > > 
> > > int main(int argc, char* argv[])
> > > {
> > >         char path[PATH_MAX];
> > >         int fd;
> > >         int rc;
> > > 
> > >         fd = open(TMP_PATH, __O_TMPFILE | O_RDWR,
> > >                         S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP |
> > > S_IROTH | S_IWOTH);
> > > 
> > >         rc = write(fd, TEST_STRING, strlen(TEST_STRING));
> > > 
> > >         snprintf(path, PATH_MAX,  "/proc/self/fd/%d", fd);
> > >         linkat(AT_FDCWD, path, AT_FDCWD, TMP_PATH TMP_FILE, AT_SYMLINK_FOLLOW);
> > >         close(fd);
> > > 
> > >         return 0;
> > > }
> > > 
> > > I have checked indoes with "df -i" tool. The first inode is used when
> > > the call to open is executed and the second one when the call to
> > > linkat is executed.
> > > It is not decreased when close is executed.
> > > 
> > > I have also tested this on an ext4 mounted fs and there only one inode is used.
> > > 
> > > I tested this on:
> > > $ cat /etc/lsb-release
> > > DISTRIB_ID=Ubuntu
> > > DISTRIB_RELEASE=18.04
> > > DISTRIB_CODENAME=bionic
> > > DISTRIB_DESCRIPTION="Ubuntu 18.04.1 LTS"
> > > 
> > > $ uname -a
> > > Linux Orion 4.15.0-43-generic #46-Ubuntu SMP Thu Dec 6 14:45:28 UTC
> > > 2018 x86_64 x86_64 x86_64 GNU/Linux
> 
> Heh, tmpfs and its weird behavior where each new link counts as a new
> inode because "each new link needs a new dentry, pinning lowmem, and
> tmpfs dentries cannot be pruned until they are unlinked."

That's very much a peculiarity of tmpfs, so agreed: it's what I expect
to be the cause, but I've not actually tracked it through and fixed yet.

> 
> It seems to have this behavior on 5.0-rc6 too:

Yes, it does.

> 
> $ /bin/df -i /tmp ; ./c ; /bin/df -i /tmp
> Filesystem      Inodes IUsed   IFree IUse% Mounted on
> tmp            1019110    17 1019093    1% /tmp
> Filesystem      Inodes IUsed   IFree IUse% Mounted on
> tmp            1019110    19 1019091    1% /tmp
> 
> Probably because shmem_tmpfile -> shmem_get_inode -> shmem_reserve_inode
> which decrements ifree when we create the tmpfile, and then the
> d_tmpfile decrements i_nlink to zero.  Now we have iused=1, nlink=0,
> assuming iused=itotal-ifree like usual.
> 
> Then the linkat call does:
> 
> shmem_link -> shmem_reserve_inode
> 
> which decrements ifree again and increments i_nlink to 1.  Now we have
> iused=2, nlink=1.
> 
> The program exits, which closes the file.  /tmp/ping/file.txt still
> exists and we haven't evicted inodes yet, so nothing much happens.
> 
> But then I added in rm -rf /tmp/ping/file.txt to see what happens.
> shmem_unlink contains this:
> 
> 	if (inode->i_nlink > 1 && !S_ISDIR(inode->i_mode))
> 		shmem_free_inode(inode->i_sb);
> 
> So shmem_iunlink *doesnt* decrement ifree but does drop the nlink, so
> our state is now iused=2, nlink=0.
> 
> Now we evict the inode, which decrements ifree, so iused=1 and the inode
> goes away.  Oops, we just leaked an ifree.
> 
> I /think/ the proper fix is to change shmem_link to decrement ifree only
> if the inode has nonzero nlink, e.g.
> 
> 	/*
> 	 * No ordinary (disk based) filesystem counts links as inodes;
> 	 * but each new link needs a new dentry, pinning lowmem, and
> 	 * tmpfs dentries cannot be pruned until they are unlinked.  If
> 	 * we're linking an O_TMPFILE file into the tmpfs we can skip
> 	 * this because there's still only one link to the inode.
> 	 */
> 	if (inode->i_nlink > 0) {
> 		ret = shmem_reserve_inode(inode->i_sb);
> 		if (ret)
> 			goto out;
> 	}
> 
> Says me who was crawling around poking at O_TMPFILE behavior all morning.

Thanks for the Cc on that patch: I thought at first that you were
coincidentally fixing up Matej's observation, but from its commit
message no.  That work is just a generic cleanup to suit XFS needs,
and won't change the tmpfs behavior one way or the other.

> Not sure if that's right; what happens to the old dentry?

I'm relieved to see your "/think/" above and "Not sure" there :)
Me too.  It is so easy to get these counting things wrong, especially
when distributed between the generic and the specific file system.

I'm not going to attempt a pronouncement until I've had time to
sink properly into it at the weekend, when I'll follow your guide
and work it through - thanks a lot for getting this far, Darrick.

Hugh

> 
> --D
> 
> > > If you need any more information, please let me know.
> > > 
> > > And please CC me when replying, I am not subscribed to the list.
> > > 
> > > Thanks and BR,
> > > Matej

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ