linux-kernel - Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20070531133649.GY32105@kernel.dk>
Date:	Thu, 31 May 2007 15:36:50 +0200
From:	Jens Axboe <jens.axboe@...cle.com>
To:	Bill Davidsen <davidsen@....com>
Cc:	David Chinner <dgc@....com>, david@...g.hm,
	Phillip Susi <psusi@....rr.com>, Neil Brown <neilb@...e.de>,
	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
	dm-devel@...hat.com, linux-raid@...r.kernel.org,
	Stefan Bader <Stefan.Bader@...ibm.com>,
	Andreas Dilger <adilger@...sterfs.com>,
	Tejun Heo <htejun@...il.com>
Subject: Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems,  and dm/md.

On Thu, May 31 2007, Bill Davidsen wrote:
> Jens Axboe wrote:
> >On Thu, May 31 2007, David Chinner wrote:
> >  
> >>On Thu, May 31, 2007 at 08:26:45AM +0200, Jens Axboe wrote:
> >>    
> >>>On Thu, May 31 2007, David Chinner wrote:
> >>>      
> >>>>IOWs, there are two parts to the problem:
> >>>>
> >>>>	1 - guaranteeing I/O ordering
> >>>>	2 - guaranteeing blocks are on persistent storage.
> >>>>
> >>>>Right now, a single barrier I/O is used to provide both of these
> >>>>guarantees. In most cases, all we really need to provide is 1); the
> >>>>need for 2) is a much rarer condition but still needs to be
> >>>>provided.
> >>>>
> >>>>        
> >>>>>if I am understanding it correctly, the big win for barriers is that 
> >>>>>you do NOT have to stop and wait until the data is on persistant media 
> >>>>>before you can continue.
> >>>>>          
> >>>>Yes, if we define a barrier to only guarantee 1), then yes this
> >>>>would be a big win (esp. for XFS). But that requires all filesystems
> >>>>to handle sync writes differently, and sync_blockdev() needs to
> >>>>call blkdev_issue_flush() as well....
> >>>>
> >>>>So, what do we do here? Do we define a barrier I/O to only provide
> >>>>ordering, or do we define it to also provide persistent storage
> >>>>writeback? Whatever we decide, it needs to be documented....
> >>>>        
> >>>The block layer already has a notion of the two types of barriers, with
> >>>a very small amount of tweaking we could expose that. There's absolutely
> >>>zero reason we can't easily support both types of barriers.
> >>>      
> >>That sounds like a good idea - we can leave the existing
> >>WRITE_BARRIER behaviour unchanged and introduce a new WRITE_ORDERED
> >>behaviour that only guarantees ordering. The filesystem can then
> >>choose which to use where appropriate....
> >>    
> >
> >Precisely. The current definition of barriers are what Chris and I came
> >up with many years ago, when solving the problem for reiserfs
> >originally. It is by no means the only feasible approach.
> >
> >I'll add a WRITE_ORDERED command to the #barrier branch, it already
> >contains the empty-bio barrier support I posted yesterday (well a
> >slightly modified and cleaned up version).
> >
> >  
> Wait. Do filesystems expect (depend on) anything but ordering now? Does 
> md? Having users of barriers as they currently behave suddenly getting 
> SYNC behavior where they expect ORDERED is likely to have a negative 
> effect on performance. Or do I misread what is actually guaranteed by 
> WRITE_BARRIER now, and a flush is currently happening in all cases?

See the above stuff you quote, it's answered there. It's not a change,
this is how the Linux barrier write has always worked since I first
implemented it. What David and I are talking about is adding a more
relaxed version as well, that just implies ordering.

> And will this also be available to user space f/s, since I just proposed 
> a project which uses one? :-(

I see several uses for that, so I'd hope so.

> I think the goal is good, more choice is almost always better choice, I 
> just want to be sure there won't be big disk performance regressions.

We can't get more heavy weight than the current barrier, it's about as
conservative as you can get.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/