lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190123195922.GA16927@twosigma.com>
Date:   Wed, 23 Jan 2019 14:59:22 -0500
From:   Thomas Walker <Thomas.Walker@...sigma.com>
To:     Elana Hashman <Elana.Hashman@...sigma.com>
CC:     "Darrick J. Wong" <darrick.wong@...cle.com>,
        "'tytso@....edu'" <tytso@....edu>,
        "'linux-ext4@...r.kernel.org'" <linux-ext4@...r.kernel.org>
Subject: Re: Phantom full ext4 root filesystems on 4.1 through 4.14 kernels

Unfortunately this still continues to be a persistent problem for us.  On another example system:

# uname -a
Linux <hostname> 4.14.67-ts1 #1 SMP Wed Aug 29 13:28:25 UTC 2018 x86_64 GNU/Linux

# df -h /
Filesystem                                              Size  Used Avail Use% Mounted on
/dev/disk/by-uuid/<uuid>                                 50G   46G  1.1G  98% /

# df -hi /
Filesystem                                             Inodes IUsed IFree IUse% Mounted on
/dev/disk/by-uuid/<uuid>                                 3.2M  306K  2.9M   10% /

# du -hsx  /
14G     /

And confirmed not to be due to sparse files or deleted but still open files.

The most interesting thing that I've been able to find so far is this:

# mount -o remount,ro /
mount: / is busy
# df -h /
Filesystem                                              Size  Used Avail Use% Mounted on
/dev/disk/by-uuid/<uuid>                                 50G   14G   33G  30% /

Something about attempting (and failing) to remount read-only frees up all of the phantom space usage.
Curious whether that sparks ideas in anyone's mind?

I've tried all manner of other things without success.  Unmounting all of the overlays.  Killing off virtually all of usersapce (dropping to single user).  Dropping page/inode/dentry caches.Nothing else (short of a reboot) seems to give us the space back.


On Wed, Dec 05, 2018 at 11:26:19AM -0500, Elana Hashman wrote:
> Okay, let's take a look at another affected host. I have not drained it, just
> cordoned it, so it's still in Kubernetes service and has running, active pods.
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ