[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <53574755.3080809@linux.intel.com>
Date: Wed, 23 Apr 2014 12:53:41 +0800
From: Jiang Liu <jiang.liu@...ux.intel.com>
To: David Rientjes <rientjes@...gle.com>,
Peter Zijlstra <peterz@...radead.org>
CC: Andrew Morton <akpm@...ux-foundation.org>,
Ingo Molnar <mingo@...nel.org>, Ingo Molnar <mingo@...hat.com>,
"Rafael J . Wysocki" <rafael.j.wysocki@...el.com>,
Tony Luck <tony.luck@...el.com>, linux-kernel@...r.kernel.org
Subject: Re: [Bugfix] sched: fix possible invalid memory access caused by
CPU hot-addition
On 2014/4/23 9:59, David Rientjes wrote:
> On Tue, 22 Apr 2014, Peter Zijlstra wrote:
>
>> On Tue, Apr 22, 2014 at 01:01:51PM -0700, Andrew Morton wrote:
>>> On Tue, 22 Apr 2014 10:15:15 +0200 Peter Zijlstra <peterz@...radead.org> wrote:
>>>
>>>> On Tue, Apr 22, 2014 at 01:27:15PM +0800, Jiang Liu wrote:
>>>>> When calling kzalloc_node(size, flags, node), we should first check
>>>>> whether node is onlined, otherwise it may cause invalid memory access
>>>>> as below.
>>>>
>>>> But this is only for memory less node crap, right?
>>>
>>> um, why are memoryless nodes crap?
>>
>> Why wouldn't they be? Having CPUs with no local memory seems decidedly
>> suboptimal.
>
> The quick fix for memoryless node issues is usually just do cpu_to_mem()
> rather than cpu_to_node() in the caller. This assumes that the arch is
> setup correctly to handle memoryless nodes with
> CONFIG_HAVE_MEMORYLESS_NODES (and we've had problems recently with
> memoryless nodes not being configured correctly on powerpc).
>
> That type of a fix would probably be better handled in the slab allocator,
> though, since kmalloc_node(nid) shouldn't crash just because nid is
> memoryless, we should be doing local_memory_node(node) when allocating the
> slab pages.
>
> However, I don't think memoryless nodes are the problem here since Jiang
> is testing for !node_online(nid) in his patch, so it's a problem with
> cpu_to_node() pointing to an offline node. It makes sense for the page
> allocator to crash in such a case, the node id is erroneous.
>
> So either the cpu-to-node mapping is invalid or alloc_fair_sched_group()
> is allocating memory for a cpu on an offline node. The
> for_each_possible_cpu() looks suspicious. There's no guarantee that
> local_memory_node(node) for an offline node will return anything with
> affinity, so falling back to NUMA_NO_NODE looks appropriate in Jiang's
> patch.
Hi David,
That's the case, alloc_fair_sched_group() is trying to allocate
memory for CPU in offline node, which then access non-exist NODE_DATA.
Thanks!
Gerry
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists