lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090604015904.GA13228@localhost>
Date:	Thu, 4 Jun 2009 09:59:04 +0800
From:	Wu Fengguang <fengguang.wu@...el.com>
To:	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Randy Dunlap <randy.dunlap@...cle.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"hifumi.hisashi@....ntt.co.jp" <hifumi.hisashi@....ntt.co.jp>,
	Jens Axboe <jens.axboe@...cle.com>
Subject: Re: mmotm 2009-06-02-16-11 uploaded (readahead)

On Thu, Jun 04, 2009 at 09:25:37AM +0800, KOSAKI Motohiro wrote:
> > On Tue, 02 Jun 2009 20:54:39 -0700
> > Randy Dunlap <randy.dunlap@...cle.com> wrote:
> > 
> > > akpm@...ux-foundation.org wrote:
> > > > The mm-of-the-moment snapshot 2009-06-02-16-11 has been uploaded to
> > > > 
> > > >    http://userweb.kernel.org/~akpm/mmotm/
> > > > 
> > > > and will soon be available at
> > > > 
> > > >    git://git.zen-sources.org/zen/mmotm.git
> > > 
> > > 
> > > readahead-add-blk_run_backing_dev.patch:
> > > 
> > > mm/readahead.c: In function 'page_cache_async_readahead':
> > > mm/readahead.c:559: error: implicit declaration of function 'blk_run_backing_dev'
> > 
> > hm, yeah, CONFIG_BLOCK=n.
> > 
> > Doing a block-specific call from inside page_cache_async_readahead() is
> > a bit of a layering violation - this may not be a block-backed
> > filesystem at all.
> > 
> > otoh, perhaps blk_run_backing_dev() is wrongly named and defined in the
> > wrong place.  Perhaps non-block-backed backing_devs want to implement
> > an unplug-style function too?  In which case the whole thing should be
> > renamed and moved outside blkdev.h.
> > 
> > If we don't want to do that, shouldn't backing_dev_info.unplug* be
> > wrapped in #ifdef CONFIG_BLOCK?  And wasn't it a layering violation to
> > put block-specific things into the backing_dev_info?
> > 
> > Jens, talk to me!
> > 
> > From the readahead POV: does it make sense to call the backing-dev's
> > "unplug" function even if that isn't a block-based device?  Or was this
> > just a weird block-device-only performance problem?  Hard to say.
> 
> More problematic.
> 
> The patch comment says 
> 
> +	/*
> +	* Normally the current page is !uptodate and lock_page() will be
> +	* immediately called to implicitly unplug the device. However this
> +	* is not always true for RAID conifgurations, where data arrives
> +	* not strictly in their submission order. In this case we need to
> +	* explicitly kick off the IO.
> 
> 
> However, hifumi-san's test result doesn't have IO reordering log.
> At least the comment is wrong. and We still don't know why nobody can
> reproduce his issue.

Right, as much as I believe the comment documents a legitimate case,
it does not actually explains hifumi's case.

Hifumi, can you help retest with some large readahead size?

Your readahead size (128K) is smaller than your max_sectors_kb (256K),
so two readahead IO requests get merged into one real IO, that means
half of the readahead requests are delayed.

The IO completion size goes down from 512 to 256 sectors:

before patch:
  8,0    3   177955    50.050313976     0  C   R 8724991 + 512 [0]
  8,0    3   177966    50.053380250     0  C   R 8725503 + 512 [0]
  8,0    3   177977    50.056970395     0  C   R 8726015 + 512 [0]
  8,0    3   177988    50.060326743     0  C   R 8726527 + 512 [0]
  8,0    3   177999    50.063922341     0  C   R 8727039 + 512 [0]

after patch:
  8,0    3   257297    50.000760847     0  C   R 9480703 + 256 [0]
  8,0    3   257306    50.003034240     0  C   R 9480959 + 256 [0]
  8,0    3   257307    50.003076338     0  C   R 9481215 + 256 [0]
  8,0    3   257323    50.004774693     0  C   R 9481471 + 256 [0]
  8,0    3   257332    50.006865854     0  C   R 9481727 + 256 [0]

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ