linux-kernel - Re: [PATCH 2/2] mm, thp: consolidate THP gfp handling into alloc_hugepage_direct

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <EA62D612-B537-435A-AF5B-96E49E878E0F@cs.rutgers.edu>
Date:   Thu, 04 Oct 2018 17:49:47 -0400
From:   "Zi Yan" <zi.yan@...rutgers.edu>
To:     "David Rientjes" <rientjes@...gle.com>
Cc:     "Kirill A. Shutemov" <kirill@...temov.name>,
        "Michal Hocko" <mhocko@...nel.org>,
        "Andrew Morton" <akpm@...ux-foundation.org>,
        "Mel Gorman" <mgorman@...e.de>, "Vlastimil Babka" <vbabka@...e.cz>,
        "Andrea Argangeli" <andrea@...nel.org>,
        "Stefan Priebe - Profihost AG" <s.priebe@...fihost.ag>,
        linux-mm@...ck.org, LKML <linux-kernel@...r.kernel.org>,
        "Michal Hocko" <mhocko@...e.com>
Subject: Re: [PATCH 2/2] mm, thp: consolidate THP gfp handling into
 alloc_hugepage_direct_gfpmask

On 4 Oct 2018, at 16:17, David Rientjes wrote:

> On Wed, 26 Sep 2018, Kirill A. Shutemov wrote:
>
>> On Tue, Sep 25, 2018 at 02:03:26PM +0200, Michal Hocko wrote:
>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>> index c3bc7e9c9a2a..c0bcede31930 100644
>>> --- a/mm/huge_memory.c
>>> +++ b/mm/huge_memory.c
>>> @@ -629,21 +629,40 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf,
>>>   *	    available
>>>   * never: never stall for any thp allocation
>>>   */
>>> -static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma)
>>> +static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma, unsigned long addr)
>>>  {
>>>  	const bool vma_madvised = !!(vma->vm_flags & VM_HUGEPAGE);
>>> +	gfp_t this_node = 0;
>>> +
>>> +#ifdef CONFIG_NUMA
>>> +	struct mempolicy *pol;
>>> +	/*
>>> +	 * __GFP_THISNODE is used only when __GFP_DIRECT_RECLAIM is not
>>> +	 * specified, to express a general desire to stay on the current
>>> +	 * node for optimistic allocation attempts. If the defrag mode
>>> +	 * and/or madvise hint requires the direct reclaim then we prefer
>>> +	 * to fallback to other node rather than node reclaim because that
>>> +	 * can lead to excessive reclaim even though there is free memory
>>> +	 * on other nodes. We expect that NUMA preferences are specified
>>> +	 * by memory policies.
>>> +	 */
>>> +	pol = get_vma_policy(vma, addr);
>>> +	if (pol->mode != MPOL_BIND)
>>> +		this_node = __GFP_THISNODE;
>>> +	mpol_cond_put(pol);
>>> +#endif
>>
>> I'm not very good with NUMA policies. Could you explain in more details how
>> the code above is equivalent to the code below?
>>
>
> It breaks mbind() because new_page() is now using numa_node_id() to
> allocate migration targets for instead of using the mempolicy.  I'm not
> sure that this patch was tested for mbind().

I do not see mbind() is broken. With both patches applied, I ran
"numactl -N 0 memhog -r1 4096m membind 1" and saw all pages are allocated
in Node 1 not Node 0, which is returned by numa_node_id().

From the source code, in alloc_pages_vma(), the nodemask is generated
from the memory policy (i.e. mbind in the case above), which only has
the nodes specified by mbind(). Then, __alloc_pages_nodemask() only uses
the zones from the nodemask. The numa_node_id() return value will be
ignored in the actual page allocation process if mbind policy is applied.

Let me know if I miss anything.


--
Best Regards
Yan Zi

Download attachment "signature.asc" of type "application/pgp-signature" (558 bytes)