linux-kernel - Re: [PATCH 1/3] nbd: support FLUSH requests

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <3EDEF735-9A67-439E-BA65-089C6AAFD1BF@alex.org.uk>
Date:	Wed, 13 Feb 2013 15:55:01 +0000
From:	Alex Bligh <alex@...x.org.uk>
To:	Paolo Bonzini <pbonzini@...hat.com>
Cc:	Alex Bligh <alex@...x.org.uk>,
	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org, nbd-general@...ts.sf.net,
	Paul Clements <Paul.Clements@...eleye.com>
Subject: Re: [PATCH 1/3] nbd: support FLUSH requests

Paolo,

On 13 Feb 2013, at 13:00, Paolo Bonzini wrote:

> But as far as I can test with free servers, the FUA bits have no
> advantage over flush.  Also, I wasn't sure if SEND_FUA without
> SEND_FLUSH is valid, and if so how to handle this combination (treat it
> as writethrough and add FUA to all requests? warn and do nothing?).

On the main opensource nbd client, the following applies:

What REQ_FUA does is an fdatasync() after the write. Code extract and
comments below from Christoph Hellwig.

What REQ_FLUSH does is to do an fsync().

The way I read Christoph's comment, provided the linux block layer always
issues a REQ_FLUSH before a REQ_FUA, there is not performance problem.

However, a REQ_FUA is going to do a f(data)?sync AFTER the write, whereas
the preceding REQ_FLUSH is going to an fsync() BEFORE the write. It seems
to me that either the FUA and FLUSH semantics are therefore different
(and we need FUA), or that Christoph's comment is wrong and that you
are guaranteed a REQ_FLUSH *after* the write with REQ_FUA.

-- 
Alex Bligh




        } else if (fua) {

          /* This is where we would do the following
           *   #ifdef USE_SYNC_FILE_RANGE
           * However, we don't, for the reasons set out below
           * by Christoph Hellwig <hch@...radead.org>
           *
           * [BEGINS] 
           * fdatasync is equivalent to fsync except that it does not flush
           * non-essential metadata (basically just timestamps in practice), but it
           * does flush metadata requried to find the data again, e.g. allocation
           * information and extent maps.  sync_file_range does nothing but flush
           * out pagecache content - it means you basically won't get your data
           * back in case of a crash if you either:
           * 
           *  a) have a volatile write cache in your disk (e.g. any normal SATA disk)
           *  b) are using a sparse file on a filesystem
           *  c) are using a fallocate-preallocated file on a filesystem
           *  d) use any file on a COW filesystem like btrfs
           * 
           * e.g. it only does anything useful for you if you do not have a volatile
           * write cache, and either use a raw block device node, or just overwrite
           * an already fully allocated (and not preallocated) file on a non-COW
           * filesystem.
           * [ENDS]
           *
           * What we should do is open a second FD with O_DSYNC set, then write to
           * that when appropriate. However, with a Linux client, every REQ_FUA
           * immediately follows a REQ_FLUSH, so fdatasync does not cause performance
           * problems.
           *
           */
#if 0
                sync_file_range(fhandle, foffset, len,
                                SYNC_FILE_RANGE_WAIT_BEFORE | SYNC_FILE_RANGE_WRITE |
                                SYNC_FILE_RANGE_WAIT_AFTER);
#else
                fdatasync(fhandle);
#endif
        }


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/