[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4C62A44D.3010102@kernel.dk>
Date: Wed, 11 Aug 2010 09:23:25 -0400
From: Jens Axboe <axboe@...nel.dk>
To: Jeff Layton <jlayton@...hat.com>
CC: Jeff Moyer <jmoyer@...hat.com>, linux-kernel@...r.kernel.org
Subject: Re: cfq: oops in __call_for_each_cic
On 08/10/2010 09:23 PM, Jeff Layton wrote:
> On Tue, 10 Aug 2010 19:58:41 -0400
> Jens Axboe <axboe@...nel.dk> wrote:
>
>> On 08/10/2010 12:35 PM, Jeff Layton wrote:
>>> On Tue, 10 Aug 2010 12:10:05 -0400
>>> Jens Axboe <axboe@...nel.dk> wrote:
>>>
>>>> On 08/10/2010 10:27 AM, Jeff Layton wrote:
>>>>> On Tue, 10 Aug 2010 10:22:41 -0400
>>>>> Jeff Moyer <jmoyer@...hat.com> wrote:
>>>>>
>>>>>> Jeff Layton <jlayton@...hat.com> writes:
>>>>>>
>>>>>>> Saw this oops on my test machine this morning. I rebooted the machine
>>>>>>> last night and hadn't done anything on it other than log in this
>>>>>>> morning. The kernel here is based on Steve French's git tree, which is
>>>>>>> based on Linus' as of Sunday Aug 8th. Last non-cifs commit is:
>>>>>>
>>>>>> This looks a lot like this bug:
>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=577968
>>>>>>
>>>>>> See also:
>>>>>> http://kerneloops.org/guilty.php?guilty=cfq_free_io_context&version=2.6.34-rc&start=2228224&end=2260991&class=oops
>>>>>>
>>>>>> It's been around since 2.6.30.8 according to kerneloops.org. If you
>>>>>> find that you have a reliable way of reproducing the issue, that would
>>>>>> be great.
>>>>>>
>>>>>
>>>>> Ok, thanks -- no clear reproducer so far. This morning was the
>>>>> first time I've seen it and it was on the console of my rawhide
>>>>> machine. The last thing I did with it was reboot it last night. I
>>>>> suspect that the gzip process came from a cron job or something.
>>>>
>>>> What version did you hit it on?
>>>>
>>>
>>> It was a kernel built out of git, based on Steve French's git tree. The
>>> last commit from Linus in it was
>>> 45d7f32c7a43cbb9592886d38190e379e2eb2226. Everything else on top of
>>> that was patches that only touched cifs code. cifs.ko hadn't been
>>> plugged in since it was rebooted.
>>
>> OK. That bug is pretty elusive, so far I haven't been able to figure
>> out what the heck is going on here and my attempts at reproducing
>> have all failed. The reports so far seem to have the cron component
>> in common. Does fedora ionice some cron jobs or anything like that?
>> Or use CLONE_IO?
>>
>
> Yes. I sort of doubt anything there would use CLONE_IO, but ionice is
> definitely used. Fedora uses anacron. I don't see any explicit calls to
> gzip in there, but it's possible something else is calling it:
>
> # grep ionice /etc/cron.*/*
> /etc/cron.daily/mlocate.cron:ionice -c2 -n7 -p $$ >/dev/null 2>&1
> /etc/cron.daily/readahead.cron:ionice -c3 -p $$ >/dev/null 2>&1
>
> # cat /etc/anacrontab
> # /etc/anacrontab: configuration file for anacron
>
> # See anacron(8) and anacrontab(5) for details.
>
> SHELL=/bin/sh
> PATH=/sbin:/bin:/usr/sbin:/usr/bin
> MAILTO=root
> # the maximal random delay added to the base delay of the jobs
> RANDOM_DELAY=45
> # the jobs will be started during the following hours only
> START_HOURS_RANGE=3-22
>
> #period in days delay in minutes job-identifier command
> 1 5 cron.daily nice run-parts /etc/cron.daily
> 7 25 cron.weekly nice run-parts /etc/cron.weekly
> @monthly 45 cron.monthly nice run-parts /etc/cron.monthly
ionice must be a deciding factor in this, perhaps coupled with something
else. Otherwise we would be seeing a lot more of these.
--
Jens Axboe
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists