lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 25 Aug 2009 17:49:08 -0700
From:	Mingming <cmm@...ibm.com>
To:	Jan Kara <jack@...e.cz>
Cc:	linux-ext4@...r.kernel.org, Eric Sandeen <sandeen@...hat.com>,
	Theodore Tso <tytso@....edu>
Subject: Re: [PATCH 1/2 V2] Direct IO for holes and fallocate: add end_io
 callback

On Tue, 2009-08-25 at 12:41 -0700, Mingming wrote:
> On Tue, 2009-08-25 at 16:01 +0200, Jan Kara wrote:
> > On Mon 24-08-09 14:38:13, Mingming wrote:
> > > On Thu, 2009-08-20 at 13:52 +0200, Jan Kara wrote:
> > > > On Wed 19-08-09 14:26:16, Mingming wrote:
> > > > > On Wed, 2009-08-19 at 16:15 +0200, Jan Kara wrote:
> > > > > > > For direct IO write to the end of file, we now also could get rid of
> > > > > > > using orphan list to protect expose stale data out after crash, when
> > > > > > > direct write to end of file isn't complete. We now fallocate blocks for
> > > > > > > the direct IO write to the end of file as well, and convert those
> > > > > > > fallocated blocks at the end of file to written when IO is complete. If
> > > > > > > fs crashed before the IO complete, it will only seen the file tail has
> > > > > > > been fallocated but won't get the stale data out.
> > > > > >   But you still probably need orphan list to truncate blocks allocated
> > > > > > beyond file end during extending direct write. So I'd just remove this
> > > > > > paragraph...
> > > > > > 
> > > > > 
> > > > > Do we still need to truncate blocks allocated at the end of file? Those
> > > > > blocks are marked as uninitialized extents (as any block allocation from
> > > > > DIO are flagged as uninitialized extents, before IO is complete), so
> > > > > that after recover from crash, the stale data won't get exposed. 
> > > > > 
> > > > > I guess I missed the cases you concerned that we need the orphan list to
> > > > > protect, could you plain a little more?
> > > >   Well, you are right it's not a security problem since no data gets
> > > > exposed. It's just that after a crash, there will be blocks allocated to
> > > > a file and it's not trivial to it find out unless you specifically stat the
> > > > file. So it's just *not nice* kind of thing. If it brings us some
> > > > advantage to not put the inode on the orphan list, then sure it might be
> > > > a tradeoff we are willing to make. But otherwise I see no point.
> > > > 
> > > 
> > > Hmm... I see what you concerned now...user probably don't like allocated
> > > blocks at the end...
> >   Yes, it's kind of counterintuitive that blocks can get allocated but
> > don't really "show up" in the file (because they are unwritten and beyond
> > i_size).
> > 
> > > I am trying to think of the ext3 case. In ext3 case, inode will be
> > > removed from orphan list once the IO is complete, but that is before the
> > > i_disksize get updated and journal handle being closed. What if the
> > > system crash after inode removed from orphan list but before the
> > > i_disksize get updated?
> >   As you write below, until we stop the handle, the transaction cannot be
> > committed and so no change actually gets written to disk.
> > 
> 
> Then in case this, if no change actually gets written to disk, then
> i_disksize hasn't get updated on disk, why we need the orphan list?
> 

Never mind, I missed the fact that i_size was updated in ext3_direct_IO
before submit the IO, so orphan list is there to truncate i_size to
i_disksize in case of crash.

> > > But since the journal handled has not marked as stopped, even without
> > > putting the inode on the orpah list, wouldn't the open journal handle
> > > and delayed i_disksize update until IO complete time, prevent we see
> > > half DIO data on disk? Though blocks are allocated to the file, but
> > > those block mapping wouldn't show up on disk, no? Did I miss anything?
> > > 
> > > In the ext4 +patch case (non AIO case), we also close the handle when
> > > end_io completed. If the system crashed we would see the blocks are
> > > allocated but marked as unwritten.
> >   Exactly, you end up with allocated and unwritten blocks beyond i_size and
> > that's what I'd like to avoid if user did not explicitely fallocate() these
> > blocks.
> > 
> 
> Yes but same as ext3, nothing should write to disk until we stop the
> handle, so the unwritten blocks flag should also been seen as the
> i_disksize has not been updated on disk yet.
> 

Please ignore this, I see what you are saying..

I think it over today, I understand the race is with AIO DIO io has not
completed, while there are truncate or other non AIO DIO which happened
completed before the pending AIO DIO completed. What we do with
i_disksize?

Maybe I just split the patches, only deals with holes and preallocation,
with doesn't need to update i_disksize... and worry about fixing DIO
write to the end of file for AIO later.

Mingming

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ