[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <54F6ADD2.3080403@huawei.com>
Date: Wed, 4 Mar 2015 15:01:38 +0800
From: Xishi Qiu <qiuxishi@...wei.com>
To: Gu Zheng <guz.fnst@...fujitsu.com>
CC: Yasuaki Ishimatsu <isimatu.yasuaki@...fujitsu.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Tang Chen <tangchen@...fujitsu.com>,
Yinghai Lu <yinghai@...nel.org>, Linux MM <linux-mm@...ck.org>,
LKML <linux-kernel@...r.kernel.org>,
Toshi Kani <toshi.kani@...com>, Mel Gorman <mgorman@...e.de>,
Tejun Heo <tj@...nel.org>, Xiexiuqi <xiexiuqi@...wei.com>,
Hanjun Guo <guohanjun@...wei.com>
Subject: Re: node-hotplug: is memset 0 safe in try_offline_node()?
On 2015/3/4 11:53, Gu Zheng wrote:
> Hi Xishi,
>
> On 03/04/2015 10:22 AM, Xishi Qiu wrote:
>
>> On 2015/3/3 18:20, Gu Zheng wrote:
>>
>>> Hi Xishi,
>>> On 03/03/2015 11:30 AM, Xishi Qiu wrote:
>>>
>>>> When hot-remove a numa node, we will clear pgdat,
>>>> but is memset 0 safe in try_offline_node()?
>>>
>>> It is not safe here. In fact, this is a temporary solution here.
>>> As you know, pgdat is accessed lock-less now, so protection
>>> mechanism (RCU?) is needed to make it completely safe here,
>>> but it seems a bit over-kill.
>>>
>>>>
>>>> process A: offline node XX:
>>>> for_each_populated_zone()
>>>> find online node XX
>>>> cond_resched()
>>>> offline cpu and memory, then try_offline_node()
>>>> node_set_offline(nid), and memset(pgdat, 0, sizeof(*pgdat))
>>>> access node XX's pgdat
>>>> NULL pointer access error
>>>
>>> It's possible, but I did not meet this condition, did you?
>>>
>>
>> Yes, we test hot-add/hot-remove node with stress, and meet the following
>> call trace several times.
>
> Thanks.
>
>>
>> next_online_pgdat()
>> int nid = next_online_node(pgdat->node_id); // it's here, pgdat is NULL
>
> memset(pgdat, 0, sizeof(*pgdat));
> This memset just sets the context of pgdat to 0, but it will not free pgdat, so the *pgdat is
> NULL* is strange here.
> But anyway, the bug is real, we must fix it.
next_zone()
pg_data_t *pgdat = zone->zone_pgdat; // I think this pgdat is NULL, and NODE_DATA() is not NULL.
...
pgdat = next_online_pgdat(pgdat);
int nid = next_online_node(pgdat->node_id); // so here is the null pointer access
Thanks for your new patch, I'll test it.
Thanks,
Xishi Qiu
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists