linux-kernel - Re: [PATCH 0/2] pdflush fix and enhancement

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <1230738056.3470.150.camel@hermosa.site>
Date:	Wed, 31 Dec 2008 08:40:56 -0700
From:	"Peter W. Morreale" <pmorreale@...ell.com>
To:	Dave Chinner <david@...morbit.com>
Cc:	Andi Kleen <andi@...stfloor.org>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/2] pdflush fix and enhancement

On Wed, 2008-12-31 at 18:08 +1100, Dave Chinner wrote:
> On Tue, Dec 30, 2008 at 09:11:04PM -0700, Peter W. Morreale wrote:
> > Actually, it seems to me that we need to look at a radically different
> > approach.  What about making background writes a property of the super
> > block? (which implies the file system)  Has that been discussed before?
> 
> Sure - there was a recent discussion in the context of how broken the
> sync(2) syscall is.
> 
> That is, some filesystems (e.g. XFS) have certain requirements
> to ensure sync actually works in all circumstances and the current
> methods that sync employs make it impossible to sync correctly.
> 
<snip>

Good point, but different. I was thinking merely in terms of the
forthcoming SSD devices and flushing, not syncing.  We are approaching
the point (from hardware...) where persistent storage is becoming
balanced (wrt speed) with RAM.  

This opens up a whole new world for cache considerations.  Consider that
if my persistent storage is as fast as memory, then I want my memory
cache size for that device to be 0 (zero) sized - there is no point.

However, I have a number of different devices on my system, some disk,
some SSD, some optical, etc.  Each has different characteristics, yet we
treat them identically.   

(well, almost identically - we run through the SB list (and
consequently, the devices) in reverse all the time :-)

WIRWTD ("What I Really Want To Do") is to incorporate the
characteristics of the devices into the caching so I can optimize both
my use of cache as well as the particular device(s). 

At the moment, we have two triggers, memory pressure (the dirty_*
tunings) and time (kupdate).  Once these thresholds are reached, we
indiscriminately (wrt devices) begin flushing to achieve the minimum
threshold again.  These are probably the right triggers from a system
perspective, but there are others we could consider as well.  

For example, on a 'slow' device, I probably want to start flushing
sooner, rather than later.  On a fast device, perhaps we wait a bit
longer before starting flushing. 

At the end of the day we are governed by Little's Law, so we have to
optimize the exit from the system. 

In general, we want flushing to reach the minimum dirty threshold as
fast as possible since we are taking cycles away from our applications.
(To me this is far more important than age...)  So, WIRWTD is to create
a heuristic that takes into account:

o Device speed
o nr pages dirty 'owned' by the device.
o nr system dirty pages (e.g. existing dirty stuff)
o age (or do we really care?)
o tunings 

Now we can weight flushing towards 'fast' devices to reach our
thresholds as well as ignore devices that offer little relief (e.g. have
no dirty pages outstanding)  

Perhaps the "cache maintenance responsibility" belongs to the device???

Best,
-PWM

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/