lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 17 Mar 2011 08:42:28 -0700
From:	Curt Wohlgemuth <curtw@...gle.com>
To:	Jan Kara <jack@...e.cz>
Cc:	Johannes Weiner <hannes@...xchg.org>,
	Greg Thelen <gthelen@...gle.com>,
	Vivek Goyal <vgoyal@...hat.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org,
	containers@...ts.osdl.org, linux-fsdevel@...r.kernel.org,
	Andrea Righi <arighi@...eler.com>,
	Balbir Singh <balbir@...ux.vnet.ibm.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	Daisuke Nishimura <nishimura@....nes.nec.co.jp>,
	Minchan Kim <minchan.kim@...il.com>,
	Ciju Rajan K <ciju@...ux.vnet.ibm.com>,
	David Rientjes <rientjes@...gle.com>,
	Wu Fengguang <fengguang.wu@...el.com>,
	Chad Talbott <ctalbott@...gle.com>,
	Justin TerAvest <teravest@...gle.com>
Subject: Re: [PATCH v6 0/9] memcg: per cgroup dirty page accounting

On Thu, Mar 17, 2011 at 7:53 AM, Jan Kara <jack@...e.cz> wrote:
> On Thu 17-03-11 13:43:50, Johannes Weiner wrote:
>> > - mem_cgroup_balance_dirty_pages(): if memcg dirty memory usage if above
>> >   background limit, then add memcg to global memcg_over_bg_limit list and use
>> >   memcg's set of memcg_bdi to wakeup each(?) corresponding bdi flusher.  If over
>> >   fg limit, then use IO-less style foreground throttling with per-memcg per-bdi
>> >   (aka memcg_bdi) accounting structure.
>>
>> I wonder if we could just schedule a for_background work manually in
>> the memcg case that writes back the corresponding memcg_bdi set (and
>> e.g. having it continue until either the memcg is below bg thresh OR
>> the global bg thresh is exceeded OR there is other work scheduled)?
>> Then we would get away without the extra list, and it doesn't sound
>> overly complex to implement.
>  But then when you stop background writeback because of other work, you
> have to know you should restart it after that other work is done. For this
> you basically need the list. With this approach of one-work-per-memcg
> you also get into problems that one cgroup can livelock the flusher thread
> and thus other memcgs won't get writeback. So you have to switch between
> memcgs once in a while.

In pre-2.6.38 kernels (when background writeback enqueued work items,
and we didn't break the loop in wb_writeback() with for_background for
other work items), we experimented with this issue.  One solution we
came up with was enqueuing a background work item for a given memory
cgroup, but limiting nr_pages to something like 2048 instead of
LONG_MAX, to avoid livelock.  Writeback would only operate on inodes
with dirty pages from this memory cgroup.

If BG writeback takes place for all memcgs that are over their BG
limts, it seems that simply asking if each inode is "related" somehow
to the a of dirty memcgs is the simplest way to go.  Assuming of
course that efficient data structures are built to answer this
question.

Thanks,
Curt

> We've tried several approaches with global background writeback before we
> arrived at what we have now and what seems to work at least reasonably...
>
>                                                                Honza
> --
> Jan Kara <jack@...e.cz>
> SUSE Labs, CR
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ