linux-kernel - Re: [PATCH] bdi_sync_writeback should WB_SYNC

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20090929161339.GB11573@duck.suse.cz>
Date:	Tue, 29 Sep 2009 18:13:39 +0200
From:	Jan Kara <jack@...e.cz>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	Jens Axboe <jens.axboe@...cle.com>,
	Chris Mason <chris.mason@...cle.com>,
	linux-kernel@...r.kernel.org, jack@...e.cz, npiggin@...e.de
Subject: Re: [PATCH] bdi_sync_writeback should WB_SYNC_NONE first

On Sun 27-09-09 10:10:35, Andrew Morton wrote:
> On Sun, 27 Sep 2009 18:55:14 +0200 Jens Axboe <jens.axboe@...cle.com> wrote:
> 
> > > I wasn't referring to this patch actually.  The code as it stands in
> > > Linus's tree right now attempts to write back up to 2^63 pages...  
> > 
> > I agree, it could make the fs sync take a looong time. This is not a new
> > issue, though.
> 
> It _should_ be a new issue.  The old code would estimate the number of
> dirty pages up-front and would then add a +50% fudge factor, so if we
> started the sync with 1GB dirty memory, we write back a max of 1.5GB.
> 
> However that might have got broken.
> 
> void sync_inodes_sb(struct super_block *sb, int wait)
> {
> 	struct writeback_control wbc = {
> 		.sync_mode	= wait ? WB_SYNC_ALL : WB_SYNC_NONE,
> 		.range_start	= 0,
> 		.range_end	= LLONG_MAX,
> 	};
> 
> 	if (!wait) {
> 		unsigned long nr_dirty = global_page_state(NR_FILE_DIRTY);
> 		unsigned long nr_unstable = global_page_state(NR_UNSTABLE_NFS);
> 
> 		wbc.nr_to_write = nr_dirty + nr_unstable +
> 			(inodes_stat.nr_inodes - inodes_stat.nr_unused);
> 	} else
> 		wbc.nr_to_write = LONG_MAX; /* doesn't actually matter */
> 
> 	sync_sb_inodes(sb, &wbc);
> }
> 
> a) the +50% isn't there in 2.6.31
> 
> b) the wait=true case appears to be vulnerable to livelock in 2.6.31.
> 
> whodidthat
> 
> 38f21977663126fef53f5585e7f1653d8ebe55c4 did that back in January.
  Yes, I even remember the (long) discussion:
http://lkml.org/lkml/2008/9/22/361
  The problem with using .nr_to_write in WB_SYNC_ALL mode is that
write_cache_pages() can return before it actually wrote the pages we
wanted to write (because someone dirtied other pages in the mean time).
So Nick rather removed the .nr_to_write thing to guarantee proper data
integrity semantics. I'm not sure what is the status of the fixes of
livelock he proposed (added him to CC).
  Maybe we could at least fix some cases of possible livelock by setting
.range_end to current i_size when calling writepages(). That would solve a
common case when somebody is extending the file. For the case when someone
writes to a huge hole, we would at least provide a finite upper bound,
although practically it might be just the same as without .range_end.

								Honza
-- 
Jan Kara <jack@...e.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/