linux-kernel - Re: 3.0rc2 oops in mem_cgroup_from

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20110610113311.409bb423.kamezawa.hiroyu@jp.fujitsu.com>
Date:	Fri, 10 Jun 2011 11:33:11 +0900
From:	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
To:	Hugh Dickins <hughd@...gle.com>
Cc:	Ying Han <yinghan@...gle.com>, Dave Jones <davej@...hat.com>,
	Linux Kernel <linux-kernel@...r.kernel.org>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>
Subject: Re: 3.0rc2 oops in mem_cgroup_from_task

On Thu, 9 Jun 2011 18:30:49 -0700 (PDT)
Hugh Dickins <hughd@...gle.com> wrote:

> On Fri, 10 Jun 2011, KAMEZAWA Hiroyuki wrote:
> > On Thu, 9 Jun 2011 16:42:09 -0700
> > Ying Han <yinghan@...gle.com> wrote:
> > 
> > > ++cc Hugh who might have seen similar crashes on his machine.
> 
> Yes, I was testing my tmpfs changes, and saw it on i386 yesterday
> morning.  Same trace as Dave's (including khugepaged, which may or
> may not be relevant), aside from the i386/x86_64 differences.
> 
> BUG: unable to handle kernel paging request at 6b6b6b87
> 
> I needed to move forward with other work on that laptop, so just
> jotted down the details to come back to later.  It came after one
> hour of building swapping load in memcg, I've not tried again since.
> 
> > 
> > Thank you for forwarding. Hmm. It seems the panic happens at khugepaged's 
> > page collapse_huge_page().
> 
> Yes, the inlining in my kernel was different,
> so collapse_huge_page() showed up in my backtrace.
> 
> > 
> > ==
> >         count_vm_event(THP_COLLAPSE_ALLOC);
> >         if (unlikely(mem_cgroup_newpage_charge(new_page, mm, GFP_KERNEL))) {
> > ==
> > It passes target mm to memcg and memcg gets a cgroup by
> > ==
> >  mem = mem_cgroup_from_task(rcu_dereference(mm->owner));
> > ==
> > Panic here means....mm->owner's task_subsys_state contains bad pointer ?
> 
> 781cc621 <mem_cgroup_from_task>:
> 781cc621:	55                   	push   %ebp
> 781cc622:	31 c0                	xor    %eax,%eax
> 781cc624:	89 e5                	mov    %esp,%ebp
> 781cc626:	8b 55 08             	mov    0x8(%ebp),%edx
> 781cc629:	85 d2                	test   %edx,%edx
> 781cc62b:	74 09                	je     781cc636 <mem_cgroup_from_task+0x15>
> 781cc62d:	8b 82 fc 08 00 00    	mov    0x8fc(%edx),%eax
> 781cc633:	8b 40 1c             	mov    0x1c(%eax),%eax   <==========
> 781cc636:	c9                   	leave  
> 781cc637:	c3                   	ret    
> 

then, access to task->cgroups->subsys[?] causes access to 6b6b6b87...

Then, task->cgroups or task->cgroups->subsys contains bad pointer.
Considering khugepaged, it grabs mm_struct and memcg make an access to
(mm->owner)->cgroups->subsys.

Then, from memcg's point of view, we need to doubt mm->owner is valid or not
for this kind of tasks.

Thank you for inputs.

-Kame







--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/