[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4FA9F9CF.8050706@jp.fujitsu.com>
Date: Wed, 09 May 2012 13:59:59 +0900
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
To: paulmck@...ux.vnet.ibm.com
CC: Sasha Levin <levinsasha928@...il.com>,
"linux-kernel@...r.kernel.org List" <linux-kernel@...r.kernel.org>,
Dave Jones <davej@...hat.com>, yinghan@...gle.com,
kosaki.motohiro@...fujitsu.com,
Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: rcu: BUG on exit_group
(2012/05/04 14:33), Paul E. McKenney wrote:
> On Fri, May 04, 2012 at 06:08:34AM +0200, Sasha Levin wrote:
>> On Thu, May 3, 2012 at 7:01 PM, Paul E. McKenney
>> <paulmck@...ux.vnet.ibm.com> wrote:
>>> On Thu, May 03, 2012 at 05:55:14PM +0200, Sasha Levin wrote:
>>>> On Thu, May 3, 2012 at 5:41 PM, Paul E. McKenney
>>>> <paulmck@...ux.vnet.ibm.com> wrote:
>>>>> On Thu, May 03, 2012 at 10:57:19AM +0200, Sasha Levin wrote:
>>>>>> Hi Paul,
>>>>>>
>>>>>> I've hit a BUG similar to the schedule_tail() one when. It happened
>>>>>> when I've started fuzzing exit_group() syscalls, and all of the traces
>>>>>> are starting with exit_group() (there's a flood of them).
>>>>>>
>>>>>> I've verified that it indeed BUGs due to the rcu preempt count.
>>>>>
>>>>> Hello, Sasha,
>>>>>
>>>>> Which version of -next are you using? I did some surgery on this
>>>>> yesterday based on some bugs Hugh Dickins tracked down, so if you
>>>>> are using something older, please move to the current -next.
>>>>
>>>> I'm using -next from today (3.4.0-rc5-next-20120503-sasha-00002-g09f55ae-dirty).
>>>
>>> Hmmm... Looking at this more closely, it looks like there really is
>>> an attempt to acquire a mutex within an RCU read-side critical section,
>>> which is illegal. Could you please bisect this?
>>
>> Right, the issue is as you described, taking a mutex inside rcu_read_lock().
>>
>> The offending commit is (I've cc'ed all parties from it):
>>
>> commit adf79cc03092ee4aec70da10e91b05fb8116ac7b
>> Author: Ying Han <yinghan@...gle.com>
>> Date: Thu May 3 15:44:01 2012 +1000
>>
>> memcg: add mlock statistic in memory.stat
>>
>> With the issue there being is that in munlock_vma_page(), it now does
>> a mem_cgroup_begin_update_page_stat() which takes the rcu_read_lock(),
>> so when the older code that was there previously will try taking a
>> mutex you'll get a BUG.
>
> Hmmm... One approach would be to switch from rcu_read_lock() to
> srcu_read_lock(), though this means carrying the index returned from
> the srcu_read_lock() to the matching srcu_read_unlock() -- and making
> the update side use synchronize_srcu() rather than synchronize_rcu().
> Alternatively, it might be possible to defer acquiring the lock until
> after exiting the RCU read-side critical section, but I don't know enough
> about mm to even guess whether this might be possible.
>
> There are probably other approaches as well...
How about this ?
==
[PATCH] memcg: fix taking mutex under rcu at munlock
Following bug was reported because mutex is held under rcu_read_lock().
[ 83.820976] BUG: sleeping function called from invalid context at
kernel/mutex.c:269
[ 83.827870] in_atomic(): 0, irqs_disabled(): 0, pid: 4506, name: trinity
[ 83.832154] 1 lock held by trinity/4506:
[ 83.834224] #0: (rcu_read_lock){.+.+..}, at: [<ffffffff811a7d87>]
munlock_vma_page+0x197/0x200
[ 83.839310] Pid: 4506, comm: trinity Tainted: G W
3.4.0-rc5-next-20120503-sasha-00002-g09f55ae-dirty #108
[ 83.849418] Call Trace:
[ 83.851182] [<ffffffff810e7218>] __might_sleep+0x1f8/0x210
[ 83.854076] [<ffffffff82d9540a>] mutex_lock_nested+0x2a/0x50
[ 83.857120] [<ffffffff811b0830>] try_to_unmap_file+0x40/0x2f0
[ 83.860242] [<ffffffff82d984bb>] ? _raw_spin_unlock_irq+0x2b/0x80
[ 83.863423] [<ffffffff810e7ffe>] ? sub_preempt_count+0xae/0xf0
[ 83.866347] [<ffffffff82d984e9>] ? _raw_spin_unlock_irq+0x59/0x80
[ 83.869570] [<ffffffff811b0caa>] try_to_munlock+0x6a/0x80
[ 83.872667] [<ffffffff811a7cc6>] munlock_vma_page+0xd6/0x200
[ 83.875646] [<ffffffff811a7d87>] ? munlock_vma_page+0x197/0x200
[ 83.878798] [<ffffffff811a7e7f>] munlock_vma_pages_range+0x8f/0xd0
[ 83.882235] [<ffffffff811a8b8a>] exit_mmap+0x5a/0x160
This bug was introduced by mem_cgroup_begin/end_update_page_stat()
which uses rcu_read_lock(). This patch fixes the bug by modifying
the range of rcu_read_lock().
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
---
mm/mlock.c | 5 +++--
1 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/mm/mlock.c b/mm/mlock.c
index 2fd967a..05ac10d1 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -123,6 +123,7 @@ void munlock_vma_page(struct page *page)
if (TestClearPageMlocked(page)) {
dec_zone_page_state(page, NR_MLOCK);
mem_cgroup_dec_page_stat(page, MEMCG_NR_MLOCK);
+ mem_cgroup_end_update_page_stat(page, &locked, &flags);
if (!isolate_lru_page(page)) {
int ret = SWAP_AGAIN;
@@ -154,8 +155,8 @@ void munlock_vma_page(struct page *page)
else
count_vm_event(UNEVICTABLE_PGMUNLOCKED);
}
- }
- mem_cgroup_end_update_page_stat(page, &locked, &flags);
+ } else
+ mem_cgroup_end_update_page_stat(page, &locked, &flags);
}
/**
--
1.7.4.1
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists