lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20141014114828.GA6524@node.dhcp.inet.fi>
Date:	Tue, 14 Oct 2014 14:48:28 +0300
From:	"Kirill A. Shutemov" <kirill@...temov.name>
To:	Alex Thorlton <athorlton@....com>
Cc:	linux-kernel@...r.kernel.org,
	Andrew Morton <akpm@...ux-foundation.org>,
	Mel Gorman <mgorman@...e.de>, Rik van Riel <riel@...hat.com>,
	Ingo Molnar <mingo@...nel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
	Hugh Dickins <hughd@...gle.com>, Bob Liu <lliubbo@...il.com>,
	Johannes Weiner <hannes@...xchg.org>, linux-mm@...ck.org
Subject: Re: [BUG] mm, thp: khugepaged can't allocate on requested node when
 confined to a cpuset

On Wed, Oct 08, 2014 at 02:10:50PM -0500, Alex Thorlton wrote:
> Hey everyone,
> 
> I've run into a some frustrating behavior from the khugepaged thread,
> that I'm hoping to get sorted out.  It appears that if you pin
> khugepaged to a cpuset (i.e. node 0),

Why whould you want to pin khugpeaged? Is there a valid use-case?
Looks like userspace shoots to its leg.

> and it begins scanning/collapsing pages for a process on a cpuset that
> doesn't have any memory nodes in common with kugepaged (i.e. node 1),
> then the collapsed pages will all be allocated khugepaged's node (in
> this case node 0), clearly breaking the cpuset boundary set up for the
> process in question.
> 
> I'm aware that there are some known issues with khugepaged performing
> off-node allocations in certain situations, but I believe this is a bit
> of a special circumstance since, in this situation, there's no way for
> khugepaged to perform an allocation on the desired node.
> 
> The problem really stems from the way that we determine the allowed
> memory nodes in get_page_from_freelist.  When we call down to
> cpuset_zone_allowed_softwall, we check current->mems_allowed to
> determine what nodes we're allowed on.  In the case of khugepaged, we'll
> be making allocations for the mm of the process we're collapsing for,
> but we'll be checking the mems_allowed of khugepaged, which can
> obviously cause some problems.

Is there a reason why we should respect cpuset limitation for kernel
threads?

Should we bypass cpuset for PF_KTHREAD completely?

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 736d8e1b6381..03a74878ad46 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1960,6 +1960,9 @@ get_page_from_freelist(gfp_t gfp_mask, nodemask_t *nodemask, unsigned int order,
 zonelist_scan:
        zonelist_rescan = false;
 
+       /* Bypass cpuset limitation if allocate from kernel thread context */
+       if (current->flags & PF_KTHREAD)
+               alloc_flags &= ~ALLOC_CPUSET;
        /*
         * Scan zonelist, looking for a zone with enough free.
         * See also __cpuset_node_allowed_softwall() comment in kernel/cpuset.c.
-- 
 Kirill A. Shutemov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ