lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 18 Oct 2011 17:03:27 -0700 (PDT)
From:	david@...g.hm
To:	David Rientjes <rientjes@...gle.com>
cc:	Jan Kara <jack@...e.cz>, Mark Mielke <mark@...k.mielke.cc>,
	Michael Kerrisk <mtk.manpages@...il.com>,
	linux-kernel@...r.kernel.org
Subject: Re: Appropriate use of sync() from user space?

On Tue, 18 Oct 2011, David Rientjes wrote:

> On Tue, 18 Oct 2011, Jan Kara wrote:
>
>>> Quick summary: We have a vendor who is claiming that it is required
>>> for their userspace program to execute sync(), and I am looking for
>>> some sort of authoritative document or person to refer them to that
>>> will state that this belief is incorrect and/or that this
>>> architecture is not acceptable in a Unix environment.
>>>
>>> I checked Google and the archives and didn't find anything
>>> appropriate. Unfortunately, the word "sync" is very popular. :-)
>>>
>>> We have users who have been experiencing 3 to 5 minutes "freezes"
>>> for a particular command which often times out and fails. I traced
>>> this down from the commercial userspace program (IBM Rational
>>> ClearCase / "cleartool mkview") that they are executing to a backend
>>> "view_server" process (also IBM Rational ClearCase) that is running
>>> sync() as a means of synchronizing their database to disk before
>>> proceeding, and VMware using a "large" memory mapped file to back
>>> it's virtual "RAM". The sync() for my computer normally completes in
>>> 7 to 8 seconds. The sync() for some of our users is taking 5 minutes
>>> or longer. This can be demonstrated simply by typing "time sync"
>>> from the command line at intervals. The time itself is relevant
>>> because if it finishes before a timeout elapses - the operation
>>> works (albeit slowly). If the timeout elapses, the operation fails.
>>>
>>> The vendor stated that sync() is integral to their synchronization
>>> process to ensure all files reach disk before they are accessed, and
>>> that this is not a defect in their product. We have a work around -
>>> run "sync" before calling their command, and this generally avoids
>>> the failures.
>>>
>>> I think the use of sync() in this regard is a hack. According to
>>> POSIX.1 and the Linux man pages, it seems clear to me that sync()
>>> does not guarantee data integrity (bytes guaranteed to have reached
>>> disk) - and it also seems clear that forcing all system data to
>>> flush out in response to a minor command is over kill. Like cutting
>>> down the forest to harvest fruit from a single tree.
>>   Actually the manpage is wrong. Linux waits for all data to be safely on
>> disk before sync returns. So calling sync is a correct way (although
>> inefficient at times) to achieve data integrity. What kernel version are
>> you using? Different kernel versions are differently efficient when doing
>> sync(2) and quite some effort went to make sync less prone to livelocks in
>> recent kernels...
>>
>
> Let's make sure to keep Michael Kerrisk cc'd if anything needs to be
> clarified in the manpages.

also, you may want to check if they are really doing a 'sync' (syncing the 
entire filesystem) or just a 'fsync' (syncing the file). Depending on the 
technical depth of the people you are talking to, they may say sync when 
what is actually happening is a fsync.

there is little dispute that fsync is correct, but not a complete answer 
to the issue. take a look at the LWN article on the subject at 
http://lwn.net/Articles/457667

Ext3 has a pathalogical condition where a sync to one file can force a 
complete journal flush, which isn't as bad as a sync of the entire 
filesystem, but can still take a long time if there is other ongoing write 
activity on the system (I knwo I've read about fsyncs taking longer than 
30 seconds, and I think I've heard of them taking minutes). As far as I 
know, Ext3 is the only filesystem to suffer this problem, but 
unfortunantly it's the default filesystem on most linux distros.

David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ