linux-ext4 - Re: [PATCH, RFC] Don't do page stablization if !CONFIG_BLKDEV

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20120308180951.GB29510@shiny>
Date:	Thu, 8 Mar 2012 13:09:51 -0500
From:	Chris Mason <chris.mason@...cle.com>
To:	"Ted Ts'o" <tytso@....edu>
Cc:	Zach Brown <zab@...bo.net>, Eric Sandeen <sandeen@...hat.com>,
	linux-fsdevel@...r.kernel.org, linux-ext4@...r.kernel.org
Subject: Re: [PATCH, RFC] Don't do page stablization if
 !CONFIG_BLKDEV_INTEGRITY

On Thu, Mar 08, 2012 at 10:54:19AM -0500, Ted Ts'o wrote:
> On Wed, Mar 07, 2012 at 09:39:50PM -0500, Zach Brown wrote:
> > 
> > >Can you devise a non-secret testcase that demonstrates this?
> > 
> > Hmm.  I bet you could get fio to do it.  Giant file, random mmap()
> > writes, spin until the CPU overwhelms writeback?
> 
> Kick off a bunch of fio processes, each in separate I/O cgroups set up
> so that each of the processes get a "fair" amount of the I/O
> bandwidth.  (This is quite common in cloud deployments where you are
> packing a huge number of tasks onto a single box; whether the tasks
> are inside virtual machines or containers don't really matter for the
> purpose of this exercise.  We basically need to simulate a system
> where the disks are busy.)
> 
> Then in one of those cgroups, create a process which is constantly
> appending to a file using buffered I/O; this could be a log file, or
> an application-level journal file; and measure the latency of that
> write system call.  Every so often, writeback will push the dirty
> pages corresponding to the log/journal file to disk.  When that
> happens, and page stablization is enabled, the latency of that write
> system call will spike.
> 
> And any time you have a distributed system where you are depending on
> a large number of RPC/SOAP/Service Oriented Architecture Enterpise
> Service Bus calls (I don't really care which buzzword you use, but IBM
> and Oracle really like the last one :-), long-tail latencies are what
> kill your responsiveness and predictability.  Especially when a thread
> goes away for a second or more...

But, why are we writeback for a second or more?  Aren't there other
parts of this we would want to fix as well?

I'm not against only turning on stable pages when they are needed, but
the code that isn't the default tends to be somewhat less used.  So it
does increase testing burden when we do want stable pages, and it tends
to make for awkward bugs that are hard to reproduce because someone
neglects to mention it.

IMHO it's much more important to nail down the 2 second writeback
latency. That's not good.

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html