linux-kernel - Re: [patch 1/1] Writeback fix for concurrent large and small file writes.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <532480950712121503r64dbd51oc4778e96cbd37e3c@mail.gmail.com>
Date:	Wed, 12 Dec 2007 15:03:04 -0800
From:	"Michael Rubin" <mrubin@...gle.com>
To:	"Peter Zijlstra" <a.p.zijlstra@...llo.nl>
Cc:	akpm@...ux-foundation.org, linux-kernel@...r.kernel.org,
	linux-mm@...ck.org, wfg@...l.ustc.edu.cn
Subject: Re: [patch 1/1] Writeback fix for concurrent large and small file writes.

On Dec 12, 2007 12:55 PM, Peter Zijlstra <a.p.zijlstra@...llo.nl> wrote:
>
> On Mon, 2007-12-10 at 18:02 -0800, Michael Rubin wrote:
> > From: Michael Rubin <mrubin@...gle.com>
> The part I miss here is the rationale on _how_ you solve the problem.
>
> The patch itself is simple enough, but I've been staring at this code
> for a while now, and I'm just not getting it.

Apologies for the lack of rationale. I have been staring at this code
for awhile also and it makes my head hurt. I have a patch coming
(hopefully today) that proposes using one data  structure with a more
consistent priority scheme for 2.6.25. To me it's simpler, but I am
biased.

The problem we encounter when we append to a large file at a fast rate
while also writing to smaller files is that the wb_kupdate thread does
not keep up with disk traffic. In this workload often the inodes end
up at fs/fs-writeback.c:287 after do_writepages, since do_writepages
did not write all the pages.  This can be due to congestion but I
think there are other causes also since I have observed so.

The first issue is that the inode is put on the s_more_io queue. This
ensures that more_io is set at the end of sync_sb_inodes. The result
from that is the wb_kupdate routine will perform a sleep at
mm/page-writeback.c:642. This slows us down enough that the wb_kupdate
cannot keep up with traffic.

The other issue is that the inode that has been placed on the
s_more_io queue cannot be processed by sync_sb_inodes until the entire
s_io list is empty. With lots of small files that are not being
dirtied as quickly as the one large inode on the s_more_io queue the
inode with the most pages being dirtied is not given attention and
wb_kupdate cannot keep up again.

mrubin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/