lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080227141646.GA22850@shareable.org>
Date:	Wed, 27 Feb 2008 14:16:46 +0000
From:	Jamie Lokier <jamie@...reable.org>
To:	Jeff Garzik <jeff@...zik.org>
Cc:	Nick Piggin <nickpiggin@...oo.com.au>,
	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
	Chris Wedgwood <cw@...f.org>
Subject: Re: Proposal for "proper" durable fsync() and fdatasync()

Jeff Garzik wrote:
> >It's not optimal even then.
> >
> >  Devices: On a software RAID, you ideally don't want to issue flushes
> >  to all drives if your database did a 1 block commit entry.  (But they
> >  probably use O_DIRECT anyway, changing the rules again).  But all that
> >  can be optimised in generic VFS code eventually.  It doesn't need
> >  filesystem assistance in most cases.
> 
> My own idea is that we create a FLUSH command for blkdev request queues, 
> to exist alongside READ, WRITE, and the current barrier implementation. 
>  Then FLUSH could be passed down through MD or DM.

I like your thought, and it has the benefit of being simple.

My thought is very similar, but with (hopefully not premature...)
optimisations:

  - I would merge FLUSH with a preceding write in some cases,
    converting to an FUA-write command.  Probably the generic request
    queue is the best place to detect and merge.  This is so that
    userspace filesystems (including guest VMs) and databases can do
    journal commits with the same I/O sequence as in kernel
    filesystems.

  - I would create BARRIER too, so that a userspace API can ask for
    this weaker form of fsync, which may improve throughput of
    userspace journalling.  

  - I would include a sector range in FLUSH and BARRIER, for MD and DM
    to flush _only_ relevant sub-devices.  This may improve performance
    for journalling both kernel and userspace filesystems, as journal
    commits are often very small and hit one or two sub-devices in RAID.

  - I would ask the nice MD and DM people to take tag-barriers rather
    than flush-barriers on the input queue, converting to
    tag-barriers, flush-barriers and independent FLUSH on the
    sub-device queues according to sector ranges and subsequent
    writes.  It's not obvious, but my barrier proposal which started
    this thread is designed to support an efficient inter-sub-device
    flush-barrier when necessary, and single-sub-device tag-barrier
    when possible.

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ