linux-kernel - Re: dm bufio: Reduce dm_bufio

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180625090957.GF28965@dhcp22.suse.cz>
Date:   Mon, 25 Jun 2018 11:09:57 +0200
From:   Michal Hocko <mhocko@...nel.org>
To:     Mikulas Patocka <mpatocka@...hat.com>
Cc:     jing xia <jing.xia.mail@...il.com>,
        Mike Snitzer <snitzer@...hat.com>, agk@...hat.com,
        dm-devel@...hat.com, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org
Subject: Re: dm bufio: Reduce dm_bufio_lock contention

On Fri 22-06-18 14:57:10, Mikulas Patocka wrote:
> 
> 
> On Fri, 22 Jun 2018, Michal Hocko wrote:
> 
> > On Fri 22-06-18 08:52:09, Mikulas Patocka wrote:
> > > 
> > > 
> > > On Fri, 22 Jun 2018, Michal Hocko wrote:
> > > 
> > > > On Fri 22-06-18 11:01:51, Michal Hocko wrote:
> > > > > On Thu 21-06-18 21:17:24, Mikulas Patocka wrote:
> > > > [...]
> > > > > > What about this patch? If __GFP_NORETRY and __GFP_FS is not set (i.e. the 
> > > > > > request comes from a block device driver or a filesystem), we should not 
> > > > > > sleep.
> > > > > 
> > > > > Why? How are you going to audit all the callers that the behavior makes
> > > > > sense and moreover how are you going to ensure that future usage will
> > > > > still make sense. The more subtle side effects gfp flags have the harder
> > > > > they are to maintain.
> > > > 
> > > > So just as an excercise. Try to explain the above semantic to users. We
> > > > currently have the following.
> > > > 
> > > >  * __GFP_NORETRY: The VM implementation will try only very lightweight
> > > >  *   memory direct reclaim to get some memory under memory pressure (thus
> > > >  *   it can sleep). It will avoid disruptive actions like OOM killer. The
> > > >  *   caller must handle the failure which is quite likely to happen under
> > > >  *   heavy memory pressure. The flag is suitable when failure can easily be
> > > >  *   handled at small cost, such as reduced throughput
> > > > 
> > > >  * __GFP_FS can call down to the low-level FS. Clearing the flag avoids the
> > > >  *   allocator recursing into the filesystem which might already be holding
> > > >  *   locks.
> > > > 
> > > > So how are you going to explain gfp & (__GFP_NORETRY | ~__GFP_FS)? What
> > > > is the actual semantic without explaining the whole reclaim or force
> > > > users to look into the code to understand that? What about GFP_NOIO |
> > > > __GFP_NORETRY? What does it mean to that "should not sleep". Do all
> > > > shrinkers have to follow that as well?
> > > 
> > > My reasoning was that there is broken code that uses __GFP_NORETRY and 
> > > assumes that it can't fail - so conditioning the change on !__GFP_FS would 
> > > minimize the diruption to the broken code.
> > > 
> > > Anyway - if you want to test only on __GFP_NORETRY (and fix those 16 
> > > broken cases that assume that __GFP_NORETRY can't fail), I'm OK with that.
> > 
> > As I've already said, this is a subtle change which is really hard to
> > reason about. Throttling on congestion has its meaning and reason. Look
> > at why we are doing that in the first place. You cannot simply say this
> 
> So - explain why is throttling needed. You support throttling, I don't, so 
> you have to explain it :)
> 
> > is ok based on your specific usecase. We do have means to achieve that.
> > It is explicit and thus it will be applied only where it makes sense.
> > You keep repeating that implicit behavior change for everybody is
> > better.
> 
> I don't want to change it for everybody. I want to change it for block 
> device drivers. I don't care what you do with non-block drivers.

Well, it is usually onus of the patch submitter to justify any change.
But let me be nice on you, for once. This throttling is triggered only
if we all the pages we have encountered during the reclaim attempt are
dirty and that means that we are rushing through the LRU list quicker
than flushers are able to clean. If we didn't throttle we could hit
stronger reclaim priorities (aka scan more to reclaim memory) and
reclaim more pages as a result.

> > I guess we will not agree on that part. I consider it a hack
> > rather than a systematic solution. I can easily imagine that we just
> > find out other call sites that would cause over reclaim or similar
> 
> If a __GFP_NORETRY allocation does overreclaim - it could be fixed by 
> returning NULL instead of doing overreclaim. The specification says that 
> callers must handle failure of __GFP_NORETRY allocations.
> 
> So yes - if you think that just skipping throttling on __GFP_NORETRY could 
> cause excessive CPU consumption trying to reclaim unreclaimable pages or 
> something like that - then you can add more points where the __GFP_NORETRY 
> is failed with NULL to avoid the excessive CPU consumption.

Which is exactly something I do not want to do. Spread __GFP_NORETRY all
over the reclaim code. Especially for something we already have means
for...
-- 
Michal Hocko
SUSE Labs