lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <87mu7uxz57.fsf@notabene.neil.brown.name>
Date:   Thu, 02 Apr 2020 15:57:56 +1100
From:   NeilBrown <neilb@...e.de>
To:     Hillf Danton <hdanton@...a.com>
Cc:     Trond Myklebust <trondmy@...merspace.com>,
        Anna Schumaker <Anna.Schumaker@...app.com>,
        Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
        linux-nfs@...r.kernel.org, LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 1/2] MM: replace PF_LESS_THROTTLE with PF_LOCAL_THROTTLE

On Thu, Apr 02 2020, Hillf Danton wrote:

> On Thu, 02 Apr 2020 10:53:20 +1100 NeilBrown wrote:
>> 
>> PF_LESS_THROTTLE exists for loop-back nfsd, and a similar need in the
>> loop block driver, where a daemon needs to write to one bdi in
>> order to free up writes queued to another bdi.
>> 
>> The daemon sets PF_LESS_THROTTLE and gets a larger allowance of dirty
>> pages, so that it can still dirty pages after other processses have been
>> throttled.
>> 
>> This approach was designed when all threads were blocked equally,
>> independently on which device they were writing to, or how fast it was.
>> Since that time the writeback algorithm has changed substantially with
>> different threads getting different allowances based on non-trivial
>> heuristics.  This means the simple "add 25%" heuristic is no longer
>> reliable.
>> 
>> This patch changes the heuristic to ignore the global limits and
>> consider only the limit relevant to the bdi being written to.  This
>> approach is already available for BDI_CAP_STRICTLIMIT users (fuse) and
>> should not introduce surprises.  This has the desired result of
>> protecting the task from the consequences of large amounts of dirty data
>> queued for other devices.
>> 
>> This approach of "only consider the target bdi" is consistent with the
>> other use of PF_LESS_THROTTLE in current_may_throttle(), were it causes
>> attention to be focussed only on the target bdi.
>> 
>> So this patch
>>  - renames PF_LESS_THROTTLE to PF_LOCAL_THROTTLE,
>>  - remove the 25% bonus that that flag gives, and
>>  - imposes 'strictlimit' handling for any process with PF_LOCAL_THROTTLE
>>    set.
>
> 	/*
> 	 * The strictlimit feature is a tool preventing mistrusted filesystems
> 	 * from growing a large number of dirty pages before throttling. For
>
> Based on the comment snippet, I suspect it is applicable to IO flushers
> unless they are likely generating tons of dirty pages. If they are,
> however, cutting their bonuses seem questionable.

The purpose of the strictlimit feature was to isolate one filesystem
(bdi) from all others, so that the one cannot create dirty pages which
unfairly disadvantage the others - this is what that comment says.
But the implementation appears to focus on the isolation, not the
specific purpose, and isolation works both ways.  It protects the others
from the one, and the one from the others.

fuse needs to be isolated so it doesn't harm others.
nfsd and loop need to be isolate so they aren't harmed by others.
I'm less familiar with IO flushers but I suspect that have exactly the
same need as nfsd and loop - they need to be isolated from dirty pages
other than on the device they are writing to.
The 25% bonus was never about giving them a bonus because they need it.
It was about protecting them from excess usage elsewhere.  I strongly
suspect that my change will provide a conceptually better service for IO
flushers. (whether it is better in a practical measurable sense I cannot
say, but I'd be surprised if it was worse).

One possible problem with strictlimit isolation is suggested by the
comment

	 *
	 * In strictlimit case make decision based on the wb counters
	 * and limits. Small writeouts when the wb limits are ramping
	 * up are the price we consciously pay for strictlimit-ing.
	 *

This suggests that starting transients may be worse in some cases.
I haven't noticed any problems in my (limited) testing so while I
suspect there is a genuine difference here, I don't expect it to be problematic.

>
>> 
>> Note that previously realtime threads were treated the same as
>> PF_LESS_THROTTLE threads.  This patch does *not* change the behvaiour for
>> real-time threads, so it is now different from the behaviour of nfsd and
>> loop tasks.  I don't know what is wanted for realtime.
>> 
>> Signed-off-by: NeilBrown <neilb@...e.de>
>> =2D--
>
> Hrm corrupted delivery?

No, that's just normal email quotation for a multi-part message (needed
for crypto-signing which I do by default)
'git am', for example, is quite capable of coping with it.

Thanks,
NeilBrown

>
>>  drivers/block/loop.c  |  2 +-
>>  fs/nfsd/vfs.c         |  9 +++++----
>>  include/linux/sched.h |  2 +-
>>  kernel/sys.c          |  2 +-
>>  mm/page-writeback.c   | 10 ++++++----
>>  mm/vmscan.c           |  4 ++--
>>  6 files changed, 16 insertions(+), 13 deletions(-)

Download attachment "signature.asc" of type "application/pgp-signature" (833 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ