[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <ACEFEB0C-704B-49C8-96A7-14C0C54EF067@e18.physik.tu-muenchen.de>
Date: Tue, 24 Apr 2007 15:03:21 +0200
From: Roland Kuhn <rkuhn@....physik.tu-muenchen.de>
To: Jens Axboe <jens.axboe@...cle.com>
Cc: Thiemo.Nagel@...tum.de,
linuxkernel Org <linux-kernel@...r.kernel.org>
Subject: Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
Hi Jens!
On 24 Apr 2007, at 14:32, Jens Axboe wrote:
> On Tue, Apr 24 2007, Roland Kuhn wrote:
>> Hi Jens!
>>
>> [I made a typo in the Cc: list so that lkml is only included as of
>> now. Actually I copied the typo from you ;-) ]
>
> Well no, you started the typo, I merely propagated it and forgot to
> fix
> it up :-)
>
Actually, I copied it from your printk() ;-) (thinking helps...)
>>>>> Sure. You might want to include NFS file access into your tests,
>>>>> since we've not triggered this with locally accessing the disks.
>>>>> BTW:
>>>>
>>>> How are you exporting the directory (what exports options) - how
>>>> is it
>>>> mounted by the client(s)? What chunksize is your raid6 using?
>>>
>>> And what are the nature of the files on the raid (huge, small, ?)
>>> and
>>> what are the client(s) doing? Just approximately, I know these
>>> things
>>> can be hard/difficult/impossible to specify.
>>>
>> The files are 100-400MB in size and the client is merging them into a
>> new file in the same directory using the ROOT library, which does in
>> essence alternating sequences of
>>
>> _llseek(somewhere)
>> read(n bytes)
>> _llseek(somewhere+n)
>> read(m bytes)
>> ...
>>
>> and then
>>
>> _llseek(somewhere)
>> rt_sigaction(ignore INT)
>> write(n bytes)
>> rt_sigaction(INT->DFL)
>> time()
>> _llseek(somewhere+n)
>> ...
>>
>> where n is of the the order of 30kB. The input files are treated
>> sequentially, not randomly.
>
> Ok, I'll see if I can reproduce it. No luck so far, I'm afraid.
>
Too bad.
>> BTW: the machine just stopped dead, no sign whatsoever on console or
>> netconsole, so I rebooted with elevator=deadline
>> (need to get some work done besides ;-) )
>
> Unfortunately expected, if we can race and lose an update to -
> >next_rq,
> we can race and corrupt some of the internal data structures as
> well. If
> you have the time and inclination, it would be interesting to see
> if you
> can reproduce with some debugging options enabled:
>
> - Enable all preempt, spinlock and lockdep debugging measures
> - Possibly slab poisoning, although that may slow you down somewhat
>
Kernel compilation under way, will report back.
> Are you using 4kb stacks?
>
No idea, 'grep -i stack .config' gives no indication, but ISTR that
4k was made the default some time back?
Ciao,
Roland
--
TU Muenchen, Physik-Department E18, James-Franck-Str., 85748 Garching
Telefon 089/289-12575; Telefax 089/289-12570
--
CERN office: 892-1-D23 phone: +41 22 7676540 mobile: +41 76 487 4482
--
Any society that would give up a little liberty to gain a little
security will deserve neither and lose both. - Benjamin Franklin
-----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GS/CS/M/MU d-(++) s:+ a-> C+++ UL++++ P+++ L+++ E(+) W+ !N K- w--- M
+ !V Y+
PGP++ t+(++) 5 R+ tv-- b+ DI++ e+++>++++ h---- y+++
------END GEEK CODE BLOCK------
Download attachment "smime.p7s" of type "application/pkcs7-signature" (4324 bytes)
Download attachment "PGP.sig" of type "application/pgp-signature" (187 bytes)
Powered by blists - more mailing lists