linux-ext4 - Re: Phantom full ext4 root filesystems on 4.1 through 4.14 kernels

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190123195922.GA16927@twosigma.com>
Date:   Wed, 23 Jan 2019 14:59:22 -0500
From:   Thomas Walker <Thomas.Walker@...sigma.com>
To:     Elana Hashman <Elana.Hashman@...sigma.com>
CC:     "Darrick J. Wong" <darrick.wong@...cle.com>,
        "'tytso@....edu'" <tytso@....edu>,
        "'linux-ext4@...r.kernel.org'" <linux-ext4@...r.kernel.org>
Subject: Re: Phantom full ext4 root filesystems on 4.1 through 4.14 kernels

Unfortunately this still continues to be a persistent problem for us.  On another example system:

# uname -a
Linux <hostname> 4.14.67-ts1 #1 SMP Wed Aug 29 13:28:25 UTC 2018 x86_64 GNU/Linux

# df -h /
Filesystem                                              Size  Used Avail Use% Mounted on
/dev/disk/by-uuid/<uuid>                                 50G   46G  1.1G  98% /

# df -hi /
Filesystem                                             Inodes IUsed IFree IUse% Mounted on
/dev/disk/by-uuid/<uuid>                                 3.2M  306K  2.9M   10% /

# du -hsx  /
14G     /

And confirmed not to be due to sparse files or deleted but still open files.

The most interesting thing that I've been able to find so far is this:

# mount -o remount,ro /
mount: / is busy
# df -h /
Filesystem                                              Size  Used Avail Use% Mounted on
/dev/disk/by-uuid/<uuid>                                 50G   14G   33G  30% /

Something about attempting (and failing) to remount read-only frees up all of the phantom space usage.
Curious whether that sparks ideas in anyone's mind?

I've tried all manner of other things without success.  Unmounting all of the overlays.  Killing off virtually all of usersapce (dropping to single user).  Dropping page/inode/dentry caches.Nothing else (short of a reboot) seems to give us the space back.

On Wed, Dec 05, 2018 at 11:26:19AM -0500, Elana Hashman wrote:
> Okay, let's take a look at another affected host. I have not drained it, just
> cordoned it, so it's still in Kubernetes service and has running, active pods.
>