lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 4 Jan 2013 12:26:01 +0800
From:	Zheng Liu <gnehzuil.liu@...il.com>
To:	Jan Kara <jack@...e.cz>
Cc:	linux-ext4@...r.kernel.org,
	"Darrick J. Wong" <darrick.wong@...cle.com>,
	Christoph Hellwig <hch@...radead.org>,
	Zheng Liu <wenqing.lz@...bao.com>
Subject: Re: [RFC][PATCH 8/9 v1] ext4: refine unwritten extent conversion

On Thu, Jan 03, 2013 at 11:56:13AM +0100, Jan Kara wrote:
> On Tue 01-01-13 13:24:45, Zheng Liu wrote:
> > On Mon, Dec 31, 2012 at 10:58:15PM +0100, Jan Kara wrote:
> > > On Mon 24-12-12 15:55:41, Zheng Liu wrote:
> > > > From: Zheng Liu <wenqing.lz@...bao.com>
> > > > 
> > > > Currently all unwritten extent conversion work is pushed into a workqueue to be
> > > > done because DIO end_io is in a irq context and this conversion needs to take
> > > > i_data_sem locking.  But we couldn't take a semaphore in a irq context.  After
> > > > tracking all extent status, we can first convert this unwritten extent in extent
> > > > status tree, and call aio_complete() and inode_dio_done() to notify upper level
> > > > that this dio has done.  We don't need to be worried about exposing a stale data
> > > > because we first try to lookup in extent status tree.  So it makes us to see the
> > > > latest extent status.  Meanwhile we queue a work to convert this unwritten
> > > > extent in extent tree.  After this improvement, reader also needn't wait this
> > > > conversion to be done when dioread_nolock is enabled.
> > > > 
> > > > CC: Jan Kara <jack@...e.cz>
> > > > CC: "Darrick J. Wong" <darrick.wong@...cle.com>
> > > > CC: Christoph Hellwig <hch@...radead.org>
> > > > Signed-off-by: Zheng Liu <wenqing.lz@...bao.com>
> > >   OK, so after reading other patches this should work fine. Just I think we
> > > should somehow mark in the extent status tree that the extent tree is
> > > inconsistent with what's on disk - something like extent dirty bit. It will
> > > be set for UNWRITTEN extents where conversion is pending logically it would
> > > also make sence to have it set for DELAYED extents. Then if we need to
> > > reclaim some extents due to memory pressure we know we have to keep dirty
> > > extents because those cache irreplacible information. What do you think?
> > 
> > Dirty bit is a good idea for UNWRITTEN extent because we can feel free
> > to reclaim all WRITTEN extents and all UNWRITTEN extents that are
> > without dirty bit.  But we can not reclaim DEALYED extents no matter
> > whether they are dirty or not because they are used to lookup an delayed
> > extent in fiemap, seek_data/hole, and bigalloc.  So at least DEALYED
> > extent must be kept in status tree.  That is why in step 1 status tree
> > only tracks all DELAYED extents in the tree.
>   So I was thinking about this some more and also testing some code and I
> realized that using extent status tree won't be enough (sadly). In case of
> AIO DIO with O_SYNC set, we have to perform extent conversion on *disk*
> before we can complete the AIO. So extent status tree won't help us in any
> way.

Ah, I see.  Conversion must be done in disk before aio_complete() is
called if AIO DIO with O_SYNC set.  So now we must call aio_complete()
after ext4_convert_unwritten_extents() in ext4_end_io().

> 
> Furthermore O_SYNC writes end up calling ->fsync() after IO is finished
> which currently waits for all unwritten extents to convert and that
> effectively deadlocks the conversion thread if there are more DIO
> conversions pending. To fix this we would have to hack around
> ext4_file_fsync() to avoid waiting in case of O_SYNC AIO writes and that
> gets nasty quickly. So I'm currently back to my original plan of completing
> IO only after extent conversion happens... I'll see how that works out.
> 
> Eventually we could *optimize* that by doing the extent conversion only in
> the extent status tree if possible but I wouldn't bother with it right now.
> For one thing, I also realized we would probably have to somehow throttle
> writers so that there are not too many outstanding conversions (when we
> complete AIO only after the conversion is finished, writer is naturally
> limited by the amount of AIOs allowed).

Sorry, currently no any idea is in my mind.  I will think about it. :-(

Thanks,
                                                - Zheng
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ