lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110811032143.GB11404@localhost>
Date:	Thu, 11 Aug 2011 11:21:43 +0800
From:	Wu Fengguang <fengguang.wu@...el.com>
To:	Vivek Goyal <vgoyal@...hat.com>
Cc:	"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Jan Kara <jack@...e.cz>, Christoph Hellwig <hch@....de>,
	Dave Chinner <david@...morbit.com>,
	Greg Thelen <gthelen@...gle.com>,
	Minchan Kim <minchan.kim@...il.com>,
	Andrea Righi <arighi@...eler.com>,
	linux-mm <linux-mm@...ck.org>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 0/5] IO-less dirty throttling v8

> [...] it only deals with controlling buffered write IO and nothing
> else. So on the same block device, other direct writes might be
> going on from same group and in this scheme a user will not have any
> control.

The IO-less balance_dirty_pages() will be able to throttle DIRECT
writes. There is nothing fundamental in the way.

The basic approach will be to add a balance_dirty_pages_ratelimited_nr()
call in the DIRECT write path, and to call into balance_dirty_pages()
regardless of the various dirty thresholds.

Then the IO-less balance_dirty_pages() has all the facilities to
throttle a task at any auto-estimated or user-specified ratelimit.

> Another disadvantage is that throttling at page cache level does not
> take care of IO spikes at device level.

Yes this is a problem. But it's a problem best fixable in the IO
scheduler.. (I cannot go to details at this time, however it does
_sound_ possible to me..)

> How do you implement proportional control here? From overall bdi bandwidth
> vary per cgroup bandwidth regularly based on cgroup weight? Again the
> issue here is that it controls only buffered WRITES and nothing else and
> in this case co-ordinating with CFQ will probably be hard. So I guess
> usage of proportional IO just for buffered WRITES will have limited
> usage.

"priority" may be a more suitable phrase. It will be implemented like
this (without the user interface):

@@ -1007,6 +1001,13 @@ static void balance_dirty_pages(struct a
                max_pause = bdi_max_pause(bdi, bdi_dirty);
               
                base_rate = bdi->dirty_ratelimit;
+               /*
+                * Double the bandwidth for PF_LESS_THROTTLE (ie. nfsd) and
+                * real-time tasks.
+                */
+               if (current->flags & PF_LESS_THROTTLE || rt_task(current))
+                       base_rate *= 2;
+              
                pos_ratio = bdi_position_ratio(bdi, dirty_thresh,
                                               background_thresh, nr_dirty,
                                               bdi_thresh, bdi_dirty);                                                        
That is, if start 2 dd tasks A and B with priority_B=2. Then the
resulting rate_B will be equal to 2*rate_A. The ->dirty_ratelimit will
auto adapt to rate_A or equally (write_bw/3).

The same can be applied to cgroup. One may specify the whole cgroup's
dirty rate be throttled at N times that of a normal dd in the root cgroup,
or be throttled at some absolute 10MB/s rate. The corresponding
cgroup->dirty_ratelimit will be set to (N * bdi->dirty_ratelimit) for
the former and 10MB/s for the latter.

The user can specify any combinations of "priority" and "absolute
ratelimit" for any task and/or cgroup, tasks inside cgroup, and so on.
We have very powerful (bdi or cgroup)->dirty_ratelimit adaptation
mechanism to support the combinations :)

The "priority" can even be applied to DIRECT dirtiers, _as long as_
there are other buffered dirtiers to generate enough dirty pages. It's
not as easy to apply priorities when there are only DIRECT dirtiers.
In contrast, the absolute ratelimit is always applicable to all kind
of tasks and cgroups.

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ