linux-kernel - Re: write 'O_DIRECT' file w/odd amount of data: desirable result?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <4D64E2BB.7010000@draigBrady.com>
Date:	Wed, 23 Feb 2011 10:34:35 +0000
From:	Pádraig Brady <P@...igBrady.com>
To:	Linda Walsh <lkml@...nx.org>
CC:	LKML <linux-kernel@...r.kernel.org>
Subject: Re: write 'O_DIRECT' file w/odd amount of data: desirable result?

On 23/02/11 04:30, Linda Walsh wrote:
> 
> 
> 
> I understand, somewhat, what is happening.
> I have two different utils, 'dd' and mbuffer both of
> which have a 'direct' option to write to disk.
> mbuffer was from my distro with a direct added, which
> is
> 
> I'm not sure if it's truncating the write to the
> lower bound of the sector size or the file-allocation-unit size
> but from a dump, piped into {cat, dd mbuffer}, the
> output sizes are:
> 
> file              size       delta
> -------------   ----------   ----
> dumptest.cat    5776419696
> dumptest.dd     5776343040   76656   
> dumptest.mbuff  5368709120   407710576
> 
> params:
> 
> dd of=dumptest.dd bs=512M oflag=direct
> mbuffer -b 5 -s 512m --direct -f -o dumptest.mbuff
> 
> original file size MOD 512M = 407710576 (answer from mbuff).
> 
> The disk it is being written to is a RAID with a span
> size of 640k (64k io*10 data disks) and formatted to
> indicated that with 'xfs' (stripe-unit=64k stripe=width=10).
> 
> This gives a 'coincidental' (??) interpretation for
> the output from 'dd', where the original file size MOD
> 640K = 76656  (the amount 'dd' is short).
> 
> Was that a coincidence or a fluke?
> Why didn't 'mbuffer' have the same shortfall -- it's was
> only related to it's 512m buffer size.
> 
> In any event, shouldn't the kernel yield the correct answer
> in either case?  It would be consistent with the processor it
> was natively developed on, the x86, where a misaligned memory
> access doesn't cause a fault at the user level, but is handled
> correctly, with a slight penalty to speed for the unaligned
> data parts.
> 
> Shouldn't the linux kernel behave similarly?
> Note, that the mbuffer program indicated an error
> (which didn't help the 'dump' program that had already exited
> with what it thought was a 'success'), though a bit
> cryptic:
> buffer: error: outputThread: error writing to dumptest.mbuff at offset
> 0x140000000: Invalid argument
> 
> summary: 5509 MByte in  8.4 sec - average of  658 MB/s
> mbuffer: warning: error during output to dumptest.mbuff: Invalid argument
> 
> dd indicated no warning or error.
> 
> ----
> I'm not aware of what either did, but no doubt neither
> expected an error in the final write and didn't handle the results
> properly.
> 
> However, wouldn't it be a good thing for linux to do 'the right thing'
> and successfully the last partial write (whichever is the case!), even
> if it has to be internally buffered and slightly slowed?  Seems
> correctness of the function should be given preference over the
> adherence to some limitation where possible.
> Software should be as forgiving and tolerant and 'err' to the side of
> least harm -- which I'd argue is getting the data to the disk, NOT
> generating some 'abnormal end' (ABEND) condition that the software can't
> handle.
> I'd think of it like a page-fault of a record not in memory.  The
> remainder of the I/O record is a 'zero-filled' buffer that fills in the
> remainder of the sector while the size of the field is set to the size
> written. ??
> 
> Vanilla kernel 2.6.35-7 x86_64 (SMP PREMPT)

Note dd will turn off O_DIRECT for the last write
if it's less than the block size.
http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=5929322c

Note also you mentioned that you piped from dump to dd.
For dd reading from a pipe I strongly suggest you specify iflag=fullblock

If there is still an issue, it seems from the above that the kernel is throwing
away data and not indicating this through the last non O_DIRECT write().

cheers,
Pádraig.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/