[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4A027348.6000808@cn.fujitsu.com>
Date: Thu, 07 May 2009 13:36:08 +0800
From: Li Zefan <lizf@...fujitsu.com>
To: Vivek Goyal <vgoyal@...hat.com>
CC: Gui Jianfeng <guijianfeng@...fujitsu.com>, nauman@...gle.com,
dpshah@...gle.com, mikew@...gle.com, fchecconi@...il.com,
paolo.valente@...more.it, jens.axboe@...cle.com,
ryov@...inux.co.jp, fernando@....ntt.co.jp, s-uchida@...jp.nec.com,
taka@...inux.co.jp, jmoyer@...hat.com, dhaval@...ux.vnet.ibm.com,
balbir@...ux.vnet.ibm.com, linux-kernel@...r.kernel.org,
containers@...ts.linux-foundation.org, righi.andrea@...il.com,
agk@...hat.com, dm-devel@...hat.com, snitzer@...hat.com,
m-ikeda@...jp.nec.com, akpm@...ux-foundation.org
Subject: Re: IO scheduler based IO Controller V2
Vivek Goyal wrote:
> On Wed, May 06, 2009 at 04:11:05PM +0800, Gui Jianfeng wrote:
>> Vivek Goyal wrote:
>>> Hi All,
>>>
>>> Here is the V2 of the IO controller patches generated on top of 2.6.30-rc4.
>>> First version of the patches was posted here.
>> Hi Vivek,
>>
>> I did some simple test for V2, and triggered an kernel panic.
>> The following script can reproduce this bug. It seems that the cgroup
>> is already removed, but IO Controller still try to access into it.
>>
>
> Hi Gui,
>
> Thanks for the report. I use cgroup_path() for debugging. I guess that
> cgroup_path() was passed null cgrp pointer that's why it crashed.
>
> If yes, then it is strange though. I call cgroup_path() only after
> grabbing a refenrece to css object. (I am assuming that if I have a valid
> reference to css object then css->cgrp can't be null).
>
Yes, css->cgrp shouldn't be NULL.. I doubt we hit a bug in cgroup here.
The code dealing with css refcnt and cgroup rmdir has changed quite a lot,
and is much more complex than it was.
> Anyway, can you please try out following patch and see if it fixes your
> crash.
...
> BTW, I tried following equivalent script and I can't see the crash on
> my system. Are you able to hit it regularly?
>
I modified the script like this:
======================
#!/bin/sh
echo 1 > /proc/sys/vm/drop_caches
mkdir /cgroup 2> /dev/null
mount -t cgroup -o io,blkio io /cgroup
mkdir /cgroup/test1
mkdir /cgroup/test2
echo 100 > /cgroup/test1/io.weight
echo 500 > /cgroup/test2/io.weight
dd if=/dev/zero bs=4096 count=128000 of=500M.1 &
pid1=$!
echo $pid1 > /cgroup/test1/tasks
dd if=/dev/zero bs=4096 count=128000 of=500M.2 &
pid2=$!
echo $pid2 > /cgroup/test2/tasks
sleep 5
kill -9 $pid1
kill -9 $pid2
for ((;count != 2;))
{
rmdir /cgroup/test1 > /dev/null 2>&1
if [ $? -eq 0 ]; then
count=$(( $count + 1 ))
fi
rmdir /cgroup/test2 > /dev/null 2>&1
if [ $? -eq 0 ]; then
count=$(( $count + 1 ))
fi
}
umount /cgroup
rmdir /cgroup
======================
I ran this script and got lockdep BUG. Full log and my config are attached.
Actually this can be triggered with the following steps on my box:
# mount -t cgroup -o blkio,io xxx /mnt
# mkdir /mnt/0
# echo $$ > /mnt/0/tasks
# echo 3 > /proc/sys/vm/drop_cache
# echo $$ > /mnt/tasks
# rmdir /mnt/0
And when I ran the script for the second time, my box was freezed
and I had to reset it.
> Instead of killing the tasks I also tried moving the tasks into root cgroup
> and then deleting test1 and test2 groups, that also did not produce any crash.
> (Hit a different bug though after 5-6 attempts :-)
>
> As I mentioned in the patchset, currently we do have issues with group
> refcounting and cgroup/group going away. Hopefully in next version they
> all should be fixed up. But still, it is nice to hear back...
>
View attachment "myconfig" of type "text/plain" (64514 bytes)
View attachment "dmesg.txt" of type "text/plain" (90539 bytes)
Powered by blists - more mailing lists