lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 3 Dec 2018 14:27:18 -0800
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Andrea Arcangeli <aarcange@...hat.com>
Cc:     mhocko@...nel.org, ying.huang@...el.com, s.priebe@...fihost.ag,
        mgorman@...hsingularity.net,
        Linux List Kernel Mailing <linux-kernel@...r.kernel.org>,
        alex.williamson@...hat.com, lkp@...org,
        David Rientjes <rientjes@...gle.com>, kirill@...temov.name,
        Andrew Morton <akpm@...ux-foundation.org>,
        zi.yan@...rutgers.edu, Vlastimil Babka <vbabka@...e.cz>
Subject: Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression

On Mon, Dec 3, 2018 at 2:04 PM Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> so I think all of David's patch is somewhat sensible, even if that
> specific "order == pageblock_order" test really looks like it might
> want to be clarified.

Side note: I think maybe people should just look at that whole
compaction logic for that block, because it doesn't make much sense to
me:

                /*
                 * Checks for costly allocations with __GFP_NORETRY, which
                 * includes THP page fault allocations
                 */
                if (costly_order && (gfp_mask & __GFP_NORETRY)) {
                        /*
                         * If compaction is deferred for high-order allocations,
                         * it is because sync compaction recently failed. If
                         * this is the case and the caller requested a THP
                         * allocation, we do not want to heavily disrupt the
                         * system, so we fail the allocation instead of entering
                         * direct reclaim.
                         */
                        if (compact_result == COMPACT_DEFERRED)
                                goto nopage;

                        /*
                         * Looks like reclaim/compaction is worth trying, but
                         * sync compaction could be very expensive, so keep
                         * using async compaction.
                         */
                        compact_priority = INIT_COMPACT_PRIORITY;
                }

this is where David wants to add *his* odd test, and I think everybody
looks at that added case

+                       if (order == pageblock_order &&
+                                       !(current->flags & PF_KTHREAD))
+                               goto nopage;

and just goes "Eww".

But I think the real problem is that it's the "goto nopage" thing that
makes _sense_, and the current cases for "let's try compaction" that
are the odd ones, and then David adds one new special case for the
sensible behavior.

For example, why would COMPACT_DEFERRED mean "don't bother", but not
all the other reasons it didn't really make sense?

So does it really make sense to fall through AT ALL to that "retry"
case, when we explicitly already had (gfp_mask & __GFP_NORETRY)?

Maybe the real fix is to instead of adding yet another special case
for "goto nopage", it should just be unconditional: simply don't try
to compact large-pages if __GFP_NORETRY was set.

Hmm? I dunno. Right now - for 4.20, I'd obviously want to keep changes
smallish, so a hacky added special case might be the right thing to
do. But the code does look odd, doesn't it?

I think part of it comes from the fact that we *used* to do the
compaction first, and then we did the reclaim, and then it was
re-orghanized to do reclaim first, but it tried to keep semantic
changes minimal and some of the above comes from that re-org.

I think.

                Linus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ