[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BANLkTinPCcX9-gK7YnMngGUJKftRTmYXNw@mail.gmail.com>
Date: Fri, 27 May 2011 09:12:34 +0200
From: "D. Jansen" <d.g.jansen@...glemail.com>
To: "Ted Ts'o" <tytso@....edu>,
"D. Jansen" <d.g.jansen@...glemail.com>,
Oliver Neukum <oneukum@...e.de>, akpm@...ux-foundation.org,
linux-kernel@...r.kernel.org, Dave Chinner <david@...morbit.com>,
njs@...ox.com, bart@...wel.tk
Subject: Re: [rfc] Ignore Fsync Calls in Laptop_Mode
On Thu, May 26, 2011 at 6:21 PM, Ted Ts'o <tytso@....edu> wrote:
> On Thu, May 26, 2011 at 06:05:43PM +0200, D. Jansen wrote:
>> Problem: any fsync call by any application spins up the hard disk any
>> time even in laptop_mode
>
> What you call a problem, I call a feature.
Problem: any fsync call by any application spins up the hard disk any
time even in laptop_mode and there's nothing the user can do about it
in user space - without risking that the application corrupts existing data if
the kernel decides to commit the queued writes in non-FIFO order OR
modifying every single application itself.
>> Because though there is no possibility to destroy data that is on disk
>> due to non FIFO flushing of application writes queued in the kernel,
>> which seems to be the main kernel level problem, yet new problems come
>> up.
>
> I'm not sure what you're talking about here. Buffered data can always
> be reordered in terms of when it is written to disk. This is
> considered good, and normal. If you want to guarantee that
> application writes are pushed out to disk, then either (a) use
> O_DIRECT, or (b) use fsync(). Those are your two options.
That reordering is exactly what I'm talking about. It wasn't my idea.
But if I understood it correctly, it's possible that the kernel
commits writes of an application, _to one and the same file_, in a
non-FIFO order, if the application does not fsync. And this _afaiu_
could result in the loss not only of new data, but complete corruption
of previously existing data in laptop mode without fsync.
But you're the expert. Is that really the case? If so, could it be
avoided without the daemon and application patching?
> If we didn't (for example) reorder writes to avoid the hard disk head
> from seeking all over the disk, that would actually cause more power
> to be consumed!
Yes, probably. But I doubt if that happens only once in a commit
window in laptop mode that the effect would destroy the gains. Also it
is not always necessary. Only writes to one file should be committed
in order. They could even be merged to one write - if they aren't
already: It seems the ordering is only necessary when an fsync occurs.
1) DDD_ (write D at 0)
2) _HHH (write H at 1) (fsync)
3) DHHH (result/merged write, in order)
As long as we don't end up with:
3) DDDH (out of order write, corrupt)
>> Now there is
(in a special write queue and coordination daemon)
>> 1) special support needed on the application side.
>
> Yep, because this is fundamentally an application-level problem, and
> the kernel doesn't have enough semantic information to solve the
> database coherency problem.
Well if we know that fsyncs mean the application needs the data to be
committed in order, couldn't we watch out for fsync calls and then (in
laptop mode when this feature specially requested by the user) switch
that application to fifo per file writes? (Disregarding the write
performance in that case.) Or we let the userspace eatmydata library
detect the same fsync and use a kernel api to switch that write to
fifo instead of fsyncing. A fifo write call might actually be useful
to other applications and scenarios as well. (trojan horse!)
Or the last write before the fsync is committed last. If reordering is
otherwise possible, this should avoid corruption and decrease
performance less. (Though we're not talking about writing hundreds of
MBs in laptop mode in my average use case scenario of office
applications and maybe a browser running.)
>
>> 2) need for new out-of-kernel buffers.
>
> Yes. So?
Shouldn't we try to avoid replicating existing infrastructure when possible?
>
>> 3) need for inter-application write alignment nightmares. This sort of
>> structure could cause very uncomfortable bugs that prevent writes from
>> happening at all in cases that were not foreseen at all.
>
> Huh? I think you are talking about order that buffered writes happen,
> and there's no problem here. It's a feature that they can be
> reordered. See above.
No, what I meant is that if there is a bug at any step of the
coordination between the applications and the daemon: in the daemon,
the software, their communication connection, etc., writes may not
occur and we may lose data without need.
>> 5) If the _application_, but not the kernel crashes, the data is safe.
>> In my experience this is the much more likely case than that the mail
>> server on my netbook optimized for battery time receives an email in
>> laptop mode, sends the other server "200" and then before the next
>> commit window my battery slips out and it's all gone.
>
> Huh? What's the problem that you're worried about here.
Your scenario sounds like this:
daemon announced when to flush data
until then application buffers data in it's user space.
This means if you save a file and the application crashes, e.g. segfaults
and is killed, the data is still in its queue and thus lost.
Without the daemon, the data would be in kernel space already and thus
safe from application crashes.
In my experience the kernel is very stable, applications are much less so.
And I really don't see this entering many applications. They would
probably say this is the task of the kernel itself or some other piece
of layer in between, but not the task of every single app developer to
reinvent write caching, coordination with the laptop writes daemon
etc. In the end we might have one or two special "write in laptop
mode" apps and as soon as I start a browser or any sqlite based app,
the problem is back.
>> I think the alternative of ensuring the application writes are
>> committed in order would make more sense:
>> e..g a _user space library_ disables fsync etc. in laptop_mode if the
>> user chooses to do so and kernel support for forced FIFO ordering or
>> writes.
>> This would fix 1) 2) 3) 4) 5) 6).
>
> And if you do this to a mysql daemon, or to a firefox or chrome
> process which uses sqllite, and you crash at a wrong time, the entire
> database could be scrambled.
Define crash at the wrong time. Because there is always a wrong time,
whether with laptop mode or without, with fsync or without.
> You can't fix this with your solution, because you want to make fsync()
> lie to the database code. And so all
> of the extra work (and power) consumed by the database code to try to
> make its database writes be safe, will be compromised by making
> fsync() unreliable.
Yes, I would like to have the liberty of extending the decrease of
safety of new data in favor of the choice of creating more new data
(due to longer run time) when in laptop mode.
I still want and use that safety, just not when I'm in laptop mode.
>
>> So you've re-thought this "All that is necessary is a kernel patch to
>> allow laptop_mode to disable fsync() calls(...)"
>> (http://tytso.livejournal.com/2009/03/15/). That post had inspired my
>> patch.
>
> I was thinking about things only from a file system perspective. The
> problem is that more and more people are running databases or other
> binary files which are updated in place on their laptops, and from a
> more holistic perspective, we have to worry about making sure that
> application-level databases are coherent in the face of a system
> crash. (For example, you drop your mobile phone, or your tablet, or
> your laptop, and the battery slips out.)
Exactly. Great example! Again, I very much agree.("Even") I don't want
to end up with
corrupt data. But I accept old data. Is there really no way to get there without
rewriting each and every application's fsync code?
Thanks for your insights!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists