linux-kernel - Re: [PATCH] writeback: Don't wait for completion in writeback_inodes_sb

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <BANLkTi=5Jgrb_wQ_AKn3LJOYERTcTix01BCGPYnjGi4iJZhxgA@mail.gmail.com>
Date:	Wed, 29 Jun 2011 14:30:17 -0700
From:	Curt Wohlgemuth <curtw@...gle.com>
To:	Christoph Hellwig <hch@...radead.org>
Cc:	Al Viro <viro@...iv.linux.org.uk>, linux-fsdevel@...r.kernel.org,
	linux-kernel@...r.kernel.org, fengguang.wu@...el.com
Subject: Re: [PATCH] writeback: Don't wait for completion in writeback_inodes_sb_nr

On Wed, Jun 29, 2011 at 11:00 AM, Christoph Hellwig <hch@...radead.org> wrote:
> On Wed, Jun 29, 2011 at 10:26:33AM -0700, Curt Wohlgemuth wrote:
>> The semantics did change between 2.6.34 and 2.6.35 though.  When the
>> work item queue was introduced in 2.6.32, the semantics changed from
>> what you describe above to what's present through 2.6.34:
>> writeback_inodes_sb() would enqueue a work item, and return.  Your
>> commit 83ba7b07 ("writeback: simplify the write back thread queue")
>> added the wait_for_completion() call, putting the semantics back to
>> where they were pre-2.6.32.
>
> Yes.  The kernels inbetween had that nasty writeback vs umount races
> that we could trigger quite often.

My apologies for not being aware of these races (I tried to search
mailing lists) -- how did they manifest themselves?  I don't see
anything about it in your commit message of 83ba7b07.  I do see that
we are not taking s_umount for sys_sync any longer (only in
sync_filesystem(), e.g., for sys_syncfs)...

>> Though one additional change between the old way (pre-2.6.32) and
>> today:  with the old kernel, the pdflush thread would operate
>> concurrently with the first (and second?) sync path through writeback.
>>  Today of course, they're *all* serialized.  So really a call to
>> sys_sync() will enqueue 3 work items -- one from
>> wakeup_flusher_threads(), one from writeback_inodes_sb(), and one from
>> sync_inodes_sb().
>
> Yes.  And having WB_SYNC_NONE items from both wakeup_flusher_threads vs
> writeback_inodes_sb is plain stupid.  The initial conversion to the
> new sync_filesystem scheme had removed the wakeup_flusher_threads
> call, but that caused a huge regression in some benchmark.
>
> As mentioned before Wu was working on some code to introduce tagging
> so that the WB_SYNC_ALL call won't start writing pages dirtied after
> the sync call, which should help with your issue.  Although to
> completely solve it we really need to get down to just two passes.

I can definitely see how tagging with the start of sync would be
helpful; also how going from three to two passes seems like a
no-brainer.

But what's the real point of doing even two passes -- one SYNC_NONE
followed by one SYNC_ALL?  Is it try to get through as many inodes as
possible before we potentially lock up by waiting on an inode on an
unavailable device?

Thanks,
Curt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/