linux-ext4 - [PATCH] ext4: check missed return value ext4_sync

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87y6hy9bqg.fsf_-_@openvz.org>
Date:	Fri, 12 Mar 2010 11:37:43 +0300
From:	Dmitry Monakhov <dmonakhov@...nvz.org>
To:	Jan Kara <jack@...e.cz>
Cc:	linux-ext4@...r.kernel.org, "Theodore Ts'o" <tytso@....edu>
Subject: [PATCH] ext4: check missed return value ext4_sync_file

Jan Kara <jack@...e.cz> writes:

>> We have to submit barrier before we start journal commit process.
>> otherwise transaction may be committed before data flushed to disk.
>> There is no difference from performance of view, but definitely
>> fsync becomes more correct.
Unfortunately this change does affect performance because latency
will be increased since we have to wait barrier before we start
journal commit. 
>> 
>> If jbd2_log_start_commit return 0 then it means that transaction
>> was already committed. So we don't have to issue barrier for
>> ordered mode, because it was already done during commit.
>   Umm, we have to - when a file has just been rewritten (i.e. no block
> allocation), then i_datasync_tid is not updated and thus we won't commit
> any transaction as a part of fdatasync (and that is correct because there
> are no metadata that need to be written for that fdatasync). But we still
> have to flush disk caches with data submitted by filemap_fdatawrite_and_wait.
Yepp. I've missed that. i thought that transaction id updated
even in that case.
The most unpleasant part in ext4_sync_file implementation is that 
barrier is issued on each fsync() call.  So some bad user may perform:
while(1) fsync(fd);
which result in bad system performance. And since barrier request is 
empty it is hard to detect the reason of troubles.
Off course we may solve it by introducing some sort of dirty flag
which is set in write_page, and clear in fsync. But it looks as
ugly workaround.
>
>> By unknown reason we ignored ret val from jbd2_log_wait_commit()
>> so even in case of EIO fsync will succeed.
>   I just forgot jbd2_log_wait_commit can return a failure...
In respect to previous comments the patch reduced to simple missed
error check fix.
BTW: While investigating similar code in ext3 i've found what
fsync is broken in case of external journal. JBD itself does not
send barrier to j_fs_dev. So if fsync goes via
log_start_commit/log_wait_commit path data loss is still possible.
I'm able to reproduce this via simple write test
wile (1) {
 write(fd, buf, 1024*1024)
 fsync(fd);
}
and then reboot in the middle of operation.
Later file content check spotted data inconsistency.
Will send a fix ASAP.


View attachment "0001-ext4-check-missed-return-value-ext4_sync_file.patch" of type "text/plain" (934 bytes)