linux-ext4 - Re: Java Stop-the-World GC stall induced by FS flush or many large file deletions

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20130912190251.GB28067@thunk.org>
Date:	Thu, 12 Sep 2013 15:02:51 -0400
From:	Theodore Ts'o <tytso@....edu>
To:	Cuong Tran <cuonghuutran@...il.com>
Cc:	linux-ext4@...r.kernel.org, linux-fsdevel@...r.kernel.org
Subject: Re: Java Stop-the-World GC stall induced by FS flush or many large
 file deletions

Are you absolutely certain your JVM attempting to write to any files
in its GC thread?  Say, to do some kind of logging?  It might be worth
stracing the JVM and correlating the GC stall with any syscalls that
might have been issued from the JVM GC thread.

Especially in the case of the FS Flush, the writeback thread isn't CPU
bound.  It will wait for the writeback to complete, but while it's
waiting, other processes or threads will be allowed to run on the CPU.

Now, if the GC thread tries to do some kind of fs operation which
requires writing to the file system, and the file sytstem is trying to
start a jbd transaction commit, file system operations can block until
all of the jbd handles associated with the previous commit can
complete.  If you are storage devices are slow, or you are using a
block cgroup to control how much I/O bandwidth a particular cgroup
could use, this can end up causing a priority inversion where a low
priority cgroup takes a while to complete, this can stall the jbd
commit completion, and this can cause new ext4 operations can stall
waiting to start a new jbd handle.

So you could have a stall happening, if it's taking a long time for
commits to complete, but it might be completely unrelated to a GC
stall.

If you enable the jbd2_run_stats tracepoint, you can get some
interesting numbers about how long the various phases of the jbd2
commit are taking.

              				- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html