linux-kernel - Re: [PATCH v6 0/3] mm,thp,shm: limit shmem THP alloc gfp

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.LSU.2.11.2012141226350.1925@eggly.anvils>
Date:   Mon, 14 Dec 2020 13:16:39 -0800 (PST)
From:   Hugh Dickins <hughd@...gle.com>
To:     Andrew Morton <akpm@...ux-foundation.org>
cc:     Rik van Riel <riel@...riel.com>, hughd@...gle.com,
        xuyu@...ux.alibaba.com, mgorman@...e.de, aarcange@...hat.com,
        willy@...radead.org, linux-kernel@...r.kernel.org,
        kernel-team@...com, linux-mm@...ck.org, vbabka@...e.cz,
        mhocko@...e.com
Subject: Re: [PATCH v6 0/3] mm,thp,shm: limit shmem THP alloc gfp_mask

On Tue, 24 Nov 2020, Rik van Riel wrote:

> The allocation flags of anonymous transparent huge pages can be controlled
> through the files in /sys/kernel/mm/transparent_hugepage/defrag, which can
> help the system from getting bogged down in the page reclaim and compaction
> code when many THPs are getting allocated simultaneously.
> 
> However, the gfp_mask for shmem THP allocations were not limited by those
> configuration settings, and some workloads ended up with all CPUs stuck
> on the LRU lock in the page reclaim code, trying to allocate dozens of
> THPs simultaneously.
> 
> This patch applies the same configurated limitation of THPs to shmem
> hugepage allocations, to prevent that from happening.
> 
> This way a THP defrag setting of "never" or "defer+madvise" will result
> in quick allocation failures without direct reclaim when no 2MB free
> pages are available.
> 
> With this patch applied, THP allocations for tmpfs will be a little
> more aggressive than today for files mmapped with MADV_HUGEPAGE,
> and a little less aggressive for files that are not mmapped or
> mapped without that flag.
> 
> v6: make khugepaged actually obey tmpfs mount flags
> v5: reduce gfp mask further if needed, to accomodate i915 (Matthew Wilcox)
> v4: rename alloc_hugepage_direct_gfpmask to vma_thp_gfp_mask (Matthew Wilcox)
> v3: fix NULL vma issue spotted by Hugh Dickins & tested
> v2: move gfp calculation to shmem_getpage_gfp as suggested by Yu Xu

Andrew, please don't rush

mmthpshmem-limit-shmem-thp-alloc-gfp_mask.patch
mmthpshm-limit-gfp-mask-to-no-more-than-specified.patch
mmthpshmem-make-khugepaged-obey-tmpfs-mount-flags.patch

to Linus in your first wave of mmotm->5.11 sendings.
Or, alternatively, go ahead and send them to Linus, but
be aware that I'm fairly likely to want adjustments later.

Sorry for limping along so far behind, but I still have more
re-reading of the threads to do, and I'm still investigating
why tmpfs huge=always becomes so ineffective in my testing with
these changes, even if I ramp up from default defrag=madvise to
defrag=always:
                    5.10   mmotm
thp_file_alloc   4641788  216027
thp_file_fallback 275339 8895647

I've been looking into it off and on for weeks (gfp_mask wrangling is
not my favourite task! so tend to find higher priorities to divert me);
hoped to arrive at a conclusion before merge window, but still have
nothing constructive to say yet, hence my silence so far.

Above's "a little less aggressive" appears understatement at present.
I respect what Rik is trying to achieve here, and I may end up
concluding that there's nothing better to be done than what he has.
My kind of hugepage-thrashing-in-low-memory may be so remote from
normal usage, and could be skirting the latency horrors we all want
to avoid: but I haven't reached that conclusion yet - the disparity
in effectiveness still deserves more investigation.

(There's also a specific issue with the gfp_mask limiting: I have
not yet reviewed the allowing and denying in detail, but it looks
like it does not respect the caller's GFP_ZONEMASK - the gfp in
shmem_getpage_gfp() and shmem_read_mapping_page_gfp() is there to
satisfy the gma500, which wanted to use shmem but could only manage
DMA32.  I doubt it wants THPS, but shmem_enabled=force forces them.)

Thanks,
Hugh