lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BANLkTimtY0ggu28i-9VzWOO1=puZ0JvcAQ@mail.gmail.com>
Date:	Mon, 30 May 2011 20:28:24 +0200
From:	"D. Jansen" <d.g.jansen@...glemail.com>
To:	david@...g.hm
Cc:	Theodore Tso <tytso@....edu>, Oliver Neukum <oneukum@...e.de>,
	akpm@...ux-foundation.org, linux-kernel@...r.kernel.org,
	Dave Chinner <david@...morbit.com>, njs@...ox.com,
	bart@...wel.tk
Subject: Re: [rfc] Ignore Fsync Calls in Laptop_Mode

On Mon, May 30, 2011 at 8:02 PM,  <david@...g.hm> wrote:
> On Mon, 30 May 2011, D. Jansen wrote:
>> On Mon, May 30, 2011 at 3:53 AM,  <david@...g.hm> wrote:
>>> On Sun, 29 May 2011, D. Jansen wrote:
>>>> On Fri, May 27, 2011 at 4:17 PM, Theodore Tso <tytso@....edu> wrote:
>>>>> On May 27, 2011, at 3:12 AM, D. Jansen wrote:
>>>>>> That reordering is exactly what I'm talking about. It wasn't my idea.
>>>>>> But if I understood it correctly, it's possible that the kernel
>>>>>> commits writes of an application, _to one and the same file_, in a
>>>>>> non-FIFO order, if the application does not fsync. And this _afaiu_
>>>>>> could result in the loss not only of new data, but complete corruption
>>>>>> of previously existing data in laptop mode without fsync.
>>>>>
>>>>> No, you're not understanding the problem.   All layers of the storage
>>>>> stack -- including the hard drive -- is allowed to reorder writes.  So
>>>>> even if the kernel sends data to the disk in the exact same order that
>>>>> the application wrote it, it could still get written in a different
>>>>> order,
>>>>> because the hard drive itself can reorder writes.   This is necessary
>>>>> for performance; if you didn't have this, the storage stack would be
>>>>> dog slow, and would consume even more power.
>>>>>
>>>>> So at least level, the only thing you can count upon is that if you
>>>>> want
>>>>> to make sure everything is flushed to stable store, you need to send
>>>>> an fsync() command at the application to file system level, or a
>>>>> barrier
>>>>> or flush command at the OS to hard drive level.
>>>>
>>>> (...)
>>>>>
>>>>> Ordering doesn't matter, because nothing, including the hard drive,
>>>>> guarantees ordering.  What does matter is that the fsync() commands
>>>>> act like barriers; writes before the fsync() command are guaranteed
>>>>> to be written to the disk, and survive a reboot, before any writes
>>>>> after
>>>>> the fsync() are processed.  See?
>>>>
>>>> Ok, thanks a lot! I understand a lot better now!
>>>> So we can't live without the fsyncs.
>>>>
>>>> So what if we would queue the fsyncs along with the writes - we would
>>>> just fsync later instead of immediately, in between the writes as they
>>>> came in. Then by design previous data could not be corrupted, right?
>>>> We would do exactly the same thing, just later.
>>>> It'd be kind of a disk write time distortion field.
>>>
>>> the problem is that the spec for fsync says that your program stops until
>>> fsync finishes. If you don't do that then you will corrupt and loose
>>> data.
>>>
>>> so if you delay fsync you will have your application (or desktop manager)
>>> freeze until the fsync completes.
>>
>> So that would not be an option. Freezing until the end of the write
>> window is not what we want.
>> Neither is ignoring the fsync because that could corrupt data, esp. in
>> databases like sqlite.
>>>
>>> if what you are wanting is the ability to say 'these things must be
>>> written
>>> before these other things to keep them from being corrupted, but I don't
>>> care when they get written (or if they get lost in a crash)' then what
>>> you
>>> want isn't fsync, it's a barrier.
>>
>> That sounds great!
>> So an fsync call in laptop mode could be interpreted as a barrier
>> and we would be reasonably save from corrupting old existing data?
>
> no, you cannot just change a fsync to a barrier, in some cases the data
> absolutly needs to be saved, not just ordered (remember the example of a
> mail server telling the other system that the data can be deleted after a
> fsync returns)

I'm not really sure I why shouldn't have that choice as a user. Just
because someone else could be running a mailserver on his system and
configure it in a way that it doesn't behave as it should?
If he really wants to do that there's really nothing we can do to stop
him. I'm sure there are other ways existing kernel options can be used
to make software behave different than it should. Are we going to
remove them all now?

The big problem is that so far only fsync existed and lots of software
seemingly abuses it as an expensive write barrier. And it would really
be lovely to have the choice to stop that on an opt-in basis in laptop
mode.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ