lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <48ECB215.4040409@linux.vnet.ibm.com>
Date:	Wed, 08 Oct 2008 18:43:57 +0530
From:	Balbir Singh <balbir@...ux.vnet.ibm.com>
To:	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
CC:	righi.andrea@...il.com, Michael Rubin <mrubin@...gle.com>,
	Andrew Morton <akpm@...ux-foundation.org>, menage@...gle.com,
	dave@...ux.vnet.ibm.com, chlunde@...g.uio.no, dpshah@...gle.com,
	eric.rannaud@...il.com, fernando@....ntt.co.jp, agk@...rceware.org,
	m.innocenti@...eca.it, s-uchida@...jp.nec.com, ryov@...inux.co.jp,
	matt@...ehost.com, dradford@...ehost.com,
	containers@...ts.linux-foundation.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC] [PATCH -mm 0/2] memcg: per cgroup dirty_ratio

KAMEZAWA Hiroyuki wrote:
> On Tue, 07 Oct 2008 17:49:49 +0200
> Andrea Righi <righi.andrea@...il.com> wrote:
> 
>> Balbir Singh wrote:
>>> Michael Rubin wrote:
>>>> On Fri, Sep 12, 2008 at 1:18 PM, Andrew Morton
>>>> <akpm@...ux-foundation.org> wrote:
>>>>> One thing to think about please: Michael Rubin is hitting problems with
>>>>> the existing /proc/sys/vm/dirty-ratio.  Its present granularity of 1%
>>>>> is just too coarse for really large machines, and as
>>>>> memory-size/disk-speed ratios continue to increase, this will just get
>>>>> worse.
>>>> Re-sending since I top-posted before. Never again. Also adding more
>>>> thoughts on a byte based interface.
>>>>
>>>> Currently the problem we are hitting is that we cannot specify pdflush
>>>> to have background limits less than 1% of memory. I am currently
>>>> finishing up a patch right now that adds a dirty_ratio_millis
>>>> interface.  I hope to submit the patch to LKML by the end of the week.
>>>>
>>>> The idea is that we don't want to break backwards compatibility and we
>>>> also don't want to have two conflicting knobs in the sysctl or
>>>> /proc/sys/vm/ space. I thought adding a new knob for those who want to
>>>> specify finer grained functionality was a compromise. So the patch has
>>>> a vm_dirty_ratio and a vm_dirty_ratio_millis interface. The first to
>>>> specify 0-100% and the second to specify .0 to .999%.
>>>>
>>>> So to represent 0.125% of RAM we set
>>>> vm_dirty_ratio = 0
>>>> vm_dirty_ratio_millis = 125
>>>>
>>>> The same for the background_ratio.
>>>>
>>>> I would also prefer using a bytes interface but I am not sure how to
>>>> offer that without  either removing the legacy interface of the ratios
>>>> or by offering a concurrent interface that might be confusing such as
>>>> when users are looking at the old one and not aware of a new one.
>>>>
>>> Just provide a vm_dirty_ration_in_bytes interface and keep it in sync with
>>> vm_dirty_ratio (they are just two representations of the same internal value)
>>> and for higher resolution propose that users use the bytes interface.
>> Hi Balbir,
>>
>> now that I read carefully the documentation, the description in
>> Documentation/filesystems/proc.txt seems to be a bit misleading. In
>> proc.txt we say that dirty_ratio and dirty_background_ratio are "a
>> percentage of total system memory", but in mm/page-writeback.c we apply
>> the percentages to the dirtyable memory: free pages + reclaimable pages.
>> So, first of all I think we should clarify this in the documentation...
>>
>> Saying that, keeping in sync the vm_dirty_amount_in_bytes according to
>> dirty_ratio_in_percentage is not a trivial task. One is a static value,
>> the other depends on the dirtyable memory in the system. If we want to
>> preserve the same behaviour we should do the following:
>>
>> dirty_ratio = x => dirty_amount_in_bytes = x * dirtyable_memory / 100
>>
>> dirty_amount_in_bytes = y => dirty_ratio = y / dirtyable_memory * 100
>>
>> But anytime the dirtyable memory (or the total memory in the system)
>> changes we should update both values accordingly to preserve the
>> coherency between them (ouch!).
>>

I see what you mean.

>> Possible solutions:
>>
>> 1) introduce fine-grained dirty_ratio handling decimals by an opportune
>>    parser (disadvantage: this would break the compatibility with all the
>>    userspace apps that expect to read an int from vm_dirty_ratio)
>>
>> 2) introduce dirty_ratio + dirty_ratio_millis (disadvantage: can
>>    generate unexpected behaviours when something is written to
>>    dirty_ratio ignoring the existence of dirty_ratio_millis)
>>
>> 3) introduce dirty_ratio + dirty_amount_in_bytes mutually exclusive,
>>    writing to one automatically "disable" the other (disadvantage:
>>    writing to dirty_ratio ignoring dirty_amount_in_bytes can cause
>>    unexpected behaviours)
>>
>> 4) introduce dirty_ratio + dirty_amount_in_bytes and change the
>>    old behaviour: when something is written to dirty_ratio,
>>    dirty_amount_in_bytes is evaluated in function of totalram_pages (or
>>    the memcg limit) and then we always use this static value, instead of
>>    something that depends on the dirtyable memory - we can easily update
>>    dirty_amount_in_bytes also when totalram_pages or the memcg limit
>>    changes (disadvantage: change an old - working - behaviour).
>>
>> 5) handle fine-grained dirty_ratio decimals by an opportune parser when
>>    writing something to dirty_ratio; export the percentage units via
>>    dirty_ratio, and the decimals via dirty_ratio_decimals; writing to
>>    dirty_ratio_decimals is not allowed.
>>
>> I tend to choose 5. The same for dirty_background_ratio.
>>
> 
> Hmm... I agree to "5"... like this ?
> ==
> prvoides
>   - vm.dirty_ratio (1/100)
>   - vm.dirty_ratio_percentmille(1/100,000, pcm)
> 
> and allow
> #echo 0.05 > vm/dirty_ratio
> #cat vm/dirty_ratio 
> 0
> #cat vm/dirty_ratio_percentmille
> 500
> ==

I guess this would be the easiest way forward, I'll let you select the
granularity of the interface and its meaning.


-- 
	Balbir
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ