linux-kernel - Re: long sleep_on_page delays writing to slow storage

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Thu, 10 Nov 2011 02:54:38 +0100
From:	Andrea Arcangeli <aarcange@...hat.com>
To:	Mel Gorman <mgorman@...e.de>
Cc:	Jan Kara <jack@...e.cz>, Andy Isaacson <adi@...apodia.org>,
	linux-kernel@...r.kernel.org, linux-mm@...r.kernel.org,
	Johannes Weiner <jweiner@...hat.com>
Subject: Re: long sleep_on_page delays writing to slow storage

On Thu, Nov 10, 2011 at 12:53:07AM +0000, Mel Gorman wrote:
> compaction. Are you ok with that? The number of THPs in use was reduced
> but it also was during a somewhat unrealistic stress test so it might
> not matter.

I think having more THP collapsed during the unrealistic load is not
so important, likely the unrelistic load is dominated not by TLB
misses but by kernel load so even if it materializes it shouldn't make
a difference. And khugepaged will just retry at the next pass anyway
so it doesn't matter if it's delayed a bit I think. And retrying on
the same address with __GFP_OTHER_NODE doesn't sound good idea.

> It's not really needed to avoid stalls - just !(gfp_mask &
> __GFP_NO_KSWAPD) is enough for that. It's only needed if we want

I would go with this first. You can keep the second patch in queue,
but considering it's altering the fast paths that affects no-THP
config too, we could at least benchmark it to be sure it's not
measurable. I guess it's not, but hey if it's not needed then we
shouldn't care.

And it was already ok, we thought it didn't matter so we reversed it
in c6a140bf164829769499b5e50d380893da39b29e but it clearly matters for
usb stick, so I would simply reapply it.

One reason we reversed it was also the fact it wasn't so clean to take
that decision in function of __GFP_NO_KSWAPD.  I think it's probably
cleaner to check if __GFP_NORETRY is set instead of __GFP_NO_KSWAPD is
set.

That flag should indicate we don't really care too much if we fail the
allocation or not and not to go too hard on it, and notably those are
the allocations that are totally ok to fail without having to trigger
OOM, so again not worth going the extra mile to succeed them.

Alternatively we could check __GFP_NOFAIL but that's mostly obsolete,
yet another alternative is to check order >_ALLOC_COSTLY_ORDER but you
know any additional PAGE_ALLOC_COSTLY_ORDER check tends to make me
unhappy as the behavior has an enormous change from
PAGE_ALLOC_COSTLY_ORDER to PAGE_ALLOC_COSTLY_ORDER+1 and that's an
arbitrary number that doesn't justify a big change in behavior. So the
less PAGE_ALLOC_COSTLY_ORDER the better, ideally there shall be none :)

So I would suggest to resubmit the 1/2 patch changed to __GFP_NORETRY
or just a plain revert with __GFP_NO_KSWAPD if you don't like the
__GFP_NORETRY.

And to queue up the change to the alloc_pages_vma for later, it's not
a bad idea at all but it only paysoff for khugepaged, and 99% of
userland allocations aren't happening there.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/