linux-kernel - Re: 2.6.28-rc4 mem_cgroup_charge

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1226957408.17897.1.camel@badari-desktop>
Date:	Mon, 17 Nov 2008 13:30:08 -0800
From:	Badari Pulavarty <pbadari@...ibm.com>
To:	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	linux-mm <linux-mm@...ck.org>,
	linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: 2.6.28-rc4 mem_cgroup_charge_common panic

On Fri, 2008-11-14 at 13:10 +0900, KAMEZAWA Hiroyuki wrote:
> On Thu, 13 Nov 2008 10:53:24 -0800
> Badari Pulavarty <pbadari@...ibm.com> wrote:
> > I tried mmtom + startpfn fix + this fix + notifier fix. Didn't help.
> > I am not using SLUB (using SLAB). Yes. I am testing "real" memory
> > remove (not just offline/online), since it executes more code of
> > freeing memmap etc.
> > 
> > Code that is panicing is list_add() in mem_cgroup_add_list().
> > I will debug it further.
> > 
> 
> Considering difference between "real" memory hotplug and logical ones,
> I found this. I hope this fixes the bug.
> But I myself can't do test this..
> 
> Thanks,
> -Kame
> 

Kame,

With this patch I am able to run tests without any issues.

Sorry for delayed response, I wanted to make sure test runs fine over
the weekend.

Tested-by: Badari Pulavarty <pbadari@...ibm.com>

Thanks,
Badari


> ==
> Fixes for memcg/memory hotplug.
> 
> 
> While memory hotplug allocate/free memmap, page_cgroup doesn't free
> page_cgroup at OFFLINE when page_cgroup is allocated via bootomem.
> (Because freeing bootmem requires special care.)
> 
> Then, if page_cgroup is allocated by bootmem and memmap is freed/allocated
> by memory hotplug, page_cgroup->page == page is no longer true and
> we have to update that.
> 
> But current MEM_ONLINE handler doesn't check it and update page_cgroup->page
> if it's not necessary to allocate page_cgroup.
> 
> And I noticed that MEM_ONLINE can be called against "part of section".
> So, freeing page_cgroup at CANCEL_ONLINE will cause trouble.
> (freeing used page_cgroup)
> Don't rollback at CANCEL. 
> 
> One more, current memory hotplug notifier is stopped by slub
> because it sets NOTIFY_STOP_MASK to return vaule. So, page_cgroup's callback
> never be called. (low priority than slub now.)
> 
> I think this slub's behavior is not intentional(BUG). and fixes it.
> 
> 
> Another way to be considered about page_cgroup allocation:
>   - free page_cgroup at OFFLINE even if it's from bootmem
>     and remove specieal handler. But it requires more changes.
> 
> 
> Signed-off-by: KAMEZAWA Hiruyoki <kamezawa.hiroyu@...fujitsu.com>
> 
> ---
>  mm/page_cgroup.c |   39 +++++++++++++++++++++++++++------------
>  mm/slub.c        |    6 ++++--
>  2 files changed, 31 insertions(+), 14 deletions(-)
> 
> Index: mmotm-2.6.28-Nov10/mm/page_cgroup.c
> ===================================================================
> --- mmotm-2.6.28-Nov10.orig/mm/page_cgroup.c
> +++ mmotm-2.6.28-Nov10/mm/page_cgroup.c
> @@ -104,18 +104,30 @@ int __meminit init_section_page_cgroup(u
>  	unsigned long table_size;
>  	int nid, index;
> 
> -	if (section->page_cgroup)
> -		return 0;
> +	if (!section->page_cgroup) {
> 
> -	nid = page_to_nid(pfn_to_page(pfn));
> -	table_size = sizeof(struct page_cgroup) * PAGES_PER_SECTION;
> -	if (slab_is_available()) {
> -		base = kmalloc_node(table_size, GFP_KERNEL, nid);
> -		if (!base)
> -			base = vmalloc_node(table_size, nid);
> -	} else {
> -		base = __alloc_bootmem_node_nopanic(NODE_DATA(nid), table_size,
> +		nid = page_to_nid(pfn_to_page(pfn));
> +		table_size = sizeof(struct page_cgroup) * PAGES_PER_SECTION;
> +		if (slab_is_available()) {
> +			base = kmalloc_node(table_size, GFP_KERNEL, nid);
> +			if (!base)
> +				base = vmalloc_node(table_size, nid);
> +		} else {
> +			base = __alloc_bootmem_node_nopanic(NODE_DATA(nid),
> +				table_size,
>  				PAGE_SIZE, __pa(MAX_DMA_ADDRESS));
> +		}
> +	} else {
> +		/*
> + 		 * We don't have to allocate page_cgroup again, but
> +		 * address of memmap may be changed. So, we have to initialize
> +		 * again.
> +		 */
> +		base = section->page_cgroup + pfn;
> +		table_size = 0;
> +		/* check address of memmap is changed or not. */
> +		if (base->page == pfn_to_page(pfn))
> +			return 0;
>  	}
> 
>  	if (!base) {
> @@ -204,19 +216,22 @@ static int page_cgroup_callback(struct n
>  		ret = online_page_cgroup(mn->start_pfn,
>  				   mn->nr_pages, mn->status_change_nid);
>  		break;
> -	case MEM_CANCEL_ONLINE:
>  	case MEM_OFFLINE:
>  		offline_page_cgroup(mn->start_pfn,
>  				mn->nr_pages, mn->status_change_nid);
>  		break;
>  	case MEM_GOING_OFFLINE:
> +	case MEM_CANCEL_ONLINE:
>  		break;
>  	case MEM_ONLINE:
>  	case MEM_CANCEL_OFFLINE:
>  		break;
>  	}
> 
> -	ret = notifier_from_errno(ret);
> +	if (ret)
> +		ret = notifier_from_errno(ret);
> +	else
> +		ret = NOTIFY_OK;
> 
>  	return ret;
>  }
> Index: mmotm-2.6.28-Nov10/mm/slub.c
> ===================================================================
> --- mmotm-2.6.28-Nov10.orig/mm/slub.c
> +++ mmotm-2.6.28-Nov10/mm/slub.c
> @@ -3220,8 +3220,10 @@ static int slab_memory_callback(struct n
>  	case MEM_CANCEL_OFFLINE:
>  		break;
>  	}
> -
> -	ret = notifier_from_errno(ret);
> +	if (ret)
> +		ret = notifier_from_errno(ret);
> +	else
> +		ret = NOTIFY_OK;
>  	return ret;
>  }
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/