[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BANLkTi=5Jgrb_wQ_AKn3LJOYERTcTix01BCGPYnjGi4iJZhxgA@mail.gmail.com>
Date: Wed, 29 Jun 2011 14:30:17 -0700
From: Curt Wohlgemuth <curtw@...gle.com>
To: Christoph Hellwig <hch@...radead.org>
Cc: Al Viro <viro@...iv.linux.org.uk>, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org, fengguang.wu@...el.com
Subject: Re: [PATCH] writeback: Don't wait for completion in writeback_inodes_sb_nr
On Wed, Jun 29, 2011 at 11:00 AM, Christoph Hellwig <hch@...radead.org> wrote:
> On Wed, Jun 29, 2011 at 10:26:33AM -0700, Curt Wohlgemuth wrote:
>> The semantics did change between 2.6.34 and 2.6.35 though. When the
>> work item queue was introduced in 2.6.32, the semantics changed from
>> what you describe above to what's present through 2.6.34:
>> writeback_inodes_sb() would enqueue a work item, and return. Your
>> commit 83ba7b07 ("writeback: simplify the write back thread queue")
>> added the wait_for_completion() call, putting the semantics back to
>> where they were pre-2.6.32.
>
> Yes. The kernels inbetween had that nasty writeback vs umount races
> that we could trigger quite often.
My apologies for not being aware of these races (I tried to search
mailing lists) -- how did they manifest themselves? I don't see
anything about it in your commit message of 83ba7b07. I do see that
we are not taking s_umount for sys_sync any longer (only in
sync_filesystem(), e.g., for sys_syncfs)...
>> Though one additional change between the old way (pre-2.6.32) and
>> today: with the old kernel, the pdflush thread would operate
>> concurrently with the first (and second?) sync path through writeback.
>> Today of course, they're *all* serialized. So really a call to
>> sys_sync() will enqueue 3 work items -- one from
>> wakeup_flusher_threads(), one from writeback_inodes_sb(), and one from
>> sync_inodes_sb().
>
> Yes. And having WB_SYNC_NONE items from both wakeup_flusher_threads vs
> writeback_inodes_sb is plain stupid. The initial conversion to the
> new sync_filesystem scheme had removed the wakeup_flusher_threads
> call, but that caused a huge regression in some benchmark.
>
> As mentioned before Wu was working on some code to introduce tagging
> so that the WB_SYNC_ALL call won't start writing pages dirtied after
> the sync call, which should help with your issue. Although to
> completely solve it we really need to get down to just two passes.
I can definitely see how tagging with the start of sync would be
helpful; also how going from three to two passes seems like a
no-brainer.
But what's the real point of doing even two passes -- one SYNC_NONE
followed by one SYNC_ALL? Is it try to get through as many inodes as
possible before we potentially lock up by waiting on an inode on an
unavailable device?
Thanks,
Curt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists