[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <EA62D612-B537-435A-AF5B-96E49E878E0F@cs.rutgers.edu>
Date: Thu, 04 Oct 2018 17:49:47 -0400
From: "Zi Yan" <zi.yan@...rutgers.edu>
To: "David Rientjes" <rientjes@...gle.com>
Cc: "Kirill A. Shutemov" <kirill@...temov.name>,
"Michal Hocko" <mhocko@...nel.org>,
"Andrew Morton" <akpm@...ux-foundation.org>,
"Mel Gorman" <mgorman@...e.de>, "Vlastimil Babka" <vbabka@...e.cz>,
"Andrea Argangeli" <andrea@...nel.org>,
"Stefan Priebe - Profihost AG" <s.priebe@...fihost.ag>,
linux-mm@...ck.org, LKML <linux-kernel@...r.kernel.org>,
"Michal Hocko" <mhocko@...e.com>
Subject: Re: [PATCH 2/2] mm, thp: consolidate THP gfp handling into
alloc_hugepage_direct_gfpmask
On 4 Oct 2018, at 16:17, David Rientjes wrote:
> On Wed, 26 Sep 2018, Kirill A. Shutemov wrote:
>
>> On Tue, Sep 25, 2018 at 02:03:26PM +0200, Michal Hocko wrote:
>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>> index c3bc7e9c9a2a..c0bcede31930 100644
>>> --- a/mm/huge_memory.c
>>> +++ b/mm/huge_memory.c
>>> @@ -629,21 +629,40 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf,
>>> * available
>>> * never: never stall for any thp allocation
>>> */
>>> -static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma)
>>> +static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma, unsigned long addr)
>>> {
>>> const bool vma_madvised = !!(vma->vm_flags & VM_HUGEPAGE);
>>> + gfp_t this_node = 0;
>>> +
>>> +#ifdef CONFIG_NUMA
>>> + struct mempolicy *pol;
>>> + /*
>>> + * __GFP_THISNODE is used only when __GFP_DIRECT_RECLAIM is not
>>> + * specified, to express a general desire to stay on the current
>>> + * node for optimistic allocation attempts. If the defrag mode
>>> + * and/or madvise hint requires the direct reclaim then we prefer
>>> + * to fallback to other node rather than node reclaim because that
>>> + * can lead to excessive reclaim even though there is free memory
>>> + * on other nodes. We expect that NUMA preferences are specified
>>> + * by memory policies.
>>> + */
>>> + pol = get_vma_policy(vma, addr);
>>> + if (pol->mode != MPOL_BIND)
>>> + this_node = __GFP_THISNODE;
>>> + mpol_cond_put(pol);
>>> +#endif
>>
>> I'm not very good with NUMA policies. Could you explain in more details how
>> the code above is equivalent to the code below?
>>
>
> It breaks mbind() because new_page() is now using numa_node_id() to
> allocate migration targets for instead of using the mempolicy. I'm not
> sure that this patch was tested for mbind().
I do not see mbind() is broken. With both patches applied, I ran
"numactl -N 0 memhog -r1 4096m membind 1" and saw all pages are allocated
in Node 1 not Node 0, which is returned by numa_node_id().
From the source code, in alloc_pages_vma(), the nodemask is generated
from the memory policy (i.e. mbind in the case above), which only has
the nodes specified by mbind(). Then, __alloc_pages_nodemask() only uses
the zones from the nodemask. The numa_node_id() return value will be
ignored in the actual page allocation process if mbind policy is applied.
Let me know if I miss anything.
--
Best Regards
Yan Zi
Download attachment "signature.asc" of type "application/pgp-signature" (558 bytes)
Powered by blists - more mailing lists