[<prev] [next>] [day] [month] [year] [list]
Message-ID: <530783D5.3060806@oracle.com>
Date: Fri, 21 Feb 2014 11:50:29 -0500
From: Sasha Levin <sasha.levin@...cle.com>
To: Naoya Horiguchi <n-horiguchi@...jp.nec.com>
CC: linux-mm@...ck.org, akpm@...ux-foundation.org, mpm@...enic.com,
cpw@....com, kosaki.motohiro@...fujitsu.com, hannes@...xchg.org,
kamezawa.hiroyu@...fujitsu.com, mhocko@...e.cz,
aneesh.kumar@...ux.vnet.ibm.com, xemul@...allels.com,
riel@...hat.com, kirill.shutemov@...ux.intel.com,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 01/11] pagewalk: update page table walker core
On 02/21/2014 11:35 AM, Naoya Horiguchi wrote:
> On Fri, Feb 21, 2014 at 01:43:20AM -0500, Sasha Levin wrote:
>> On 02/20/2014 10:20 PM, Naoya Horiguchi wrote:
>>> Hi Sasha,
>>>
>>> On Thu, Feb 20, 2014 at 06:47:56PM -0500, Sasha Levin wrote:
>>>> Hi Naoya,
>>>>
>>>> This patch seems to trigger a NULL ptr deref here. I didn't have a change to look into it yet
>>>> but here's the spew:
>>>
>>> Thanks for reporting.
>>> I'm not sure what caused this bug from the kernel message. But in my guessing,
>>> it seems that the NULL pointer is deep inside lockdep routine __lock_acquire(),
>>> so if we find out which pointer was NULL, it might be useful to bisect which
>>> the proble is (page table walker or lockdep, or both.)
>>
>> This actually points to walk_pte_range() trying to lock a NULL spinlock. It happens when we call
>> pte_offset_map_lock() and get a NULL ptl out of pte_lockptr().
>
> I don't think page->ptl was NULL, because if so we hit NULL pointer dereference
> outside __lock_acquire() (it's derefered in __raw_spin_lock()).
> Maybe page->ptl->lock_dep was NULL. I'll digging it more to find out how we failed
> to set this lock_dep thing.
I don't see __raw_spin_lock() derefing it before calling __lock_acquire():
static inline void __raw_spin_lock(raw_spinlock_t *lock)
{
preempt_disable();
spin_acquire(&lock->dep_map, 0, 0, _RET_IP_);
LOCK_CONTENDED(lock, do_raw_spin_trylock, do_raw_spin_lock);
}
So after we disable preemption, spin_acquire() is basically a macro that ends up pointing to
lock_acquire().
__raw_spin_lock() would dereference 'lock' only after the lockdep call.
>>> BTW, just from curiousity, in my build environment many of kernel functions
>>> are inlined, so should not be shown in kernel message. But in your report
>>> we can see the symbols like walk_pte_range() and __lock_acquire() which never
>>> appear in my kernel. How did you do it? I turned off CONFIG_OPTIMIZE_INLINING,
>>> but didn't make it.
>>
>> I'm really not sure. I've got a bunch of debug options enabled and it just seems to do the trick.
>>
>> Try CONFIG_READABLE_ASM maybe?
>
> Hmm, it makes no change, can I have your config?
Sure, attached.
Thanks,
Sasha
Download attachment "config.gz" of type "application/gzip" (39429 bytes)
Powered by blists - more mailing lists