linux-kernel - Re: [PATCH 8/8] vm: Add an tuning knob for vm.max_writeback

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20090914111721.GA24075@duck.suse.cz>
Date:	Mon, 14 Sep 2009 13:17:21 +0200
From:	Jan Kara <jack@...e.cz>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Jan Kara <jack@...e.cz>, Chris Mason <chris.mason@...cle.com>,
	Artem Bityutskiy <dedekind1@...il.com>,
	Jens Axboe <jens.axboe@...cle.com>,
	linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
	david@...morbit.com, hch@...radead.org, akpm@...ux-foundation.org,
	Theodore Ts'o <tytso@....edu>,
	Wu Fengguang <fengguang.wu@...el.com>
Subject: Re: [PATCH 8/8] vm: Add an tuning knob for vm.max_writeback_mb

On Thu 10-09-09 17:49:10, Peter Zijlstra wrote:
> On Wed, 2009-09-09 at 16:23 +0200, Jan Kara wrote:
> >   Well, what I imagined we could do is:
> > Have a per-bdi variable 'pages_written' - that would reflect the amount of
> > pages written to the bdi since boot (OK, we'd have to handle overflows but
> > that's doable).
> > 
> > There will be a per-bdi variable 'pages_waited'. When a thread should sleep
> > in balance_dirty_pages() because we are over limits, it kicks writeback thread
> > and does:
> >   to_wait =  max(pages_waited, pages_written) + sync_dirty_pages() (or
> > whatever number we decide)
> >   pages_waited = to_wait
> >   sleep until pages_written reaches to_wait or we drop below dirty limits.
> > 
> > That will make sure each thread will sleep until writeback threads have done
> > their duty for the writing thread.
> > 
> > If we make sure sleeping threads are properly ordered on the wait queue,
> > we could always wakeup just the first one and thus avoid the herding
> > effect. When we drop below dirty limits, we would just wakeup the whole
> > waitqueue.
> > 
> > Does this sound reasonable?
> 
> That seems to go wrong when there's multiple tasks waiting on the same
> bdi, you'd count each page for 1/n its weight.
> 
> Suppose pages_written = 1024, and 4 tasks block and compute their to
> wait as pages_written + 256 = 1280, then we'd release all 4 of them
> after 256 pages are written, instead of 4*256, which would be
> pages_written = 2048.
  Well, there's some locking needed of course. The intent is to stack
demands as they come. So in case pages_written = 1024, pages_waited = 1024
we would do:
THREAD 1:

spin_lock
to_wait = 1024 + 256
pages_waited = 1280
spin_unlock

THREAD 2:

spin_lock
to_wait = 1280 + 256
pages_waited = 1536
spin_unlock

  So weight of each page will be kept. The fact that second thread
effectively waits until the first thread has its demand satisfied looks
strange at the first sight but we don't do better currently and I think
it's fine - if they were two writer threads, then soon the thread released
first will queue behind the thread still waiting so long term the behavior
should be fair.

								Honza
-- 
Jan Kara <jack@...e.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/