linux-kernel - Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <ACEFEB0C-704B-49C8-96A7-14C0C54EF067@e18.physik.tu-muenchen.de>
Date:	Tue, 24 Apr 2007 15:03:21 +0200
From:	Roland Kuhn <rkuhn@....physik.tu-muenchen.de>
To:	Jens Axboe <jens.axboe@...cle.com>
Cc:	Thiemo.Nagel@...tum.de,
	linuxkernel Org <linux-kernel@...r.kernel.org>
Subject: Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert

Hi Jens!

On 24 Apr 2007, at 14:32, Jens Axboe wrote:

> On Tue, Apr 24 2007, Roland Kuhn wrote:
>> Hi Jens!
>>
>> [I made a typo in the Cc: list so that lkml is only included as of
>> now. Actually I copied the typo from you ;-) ]
>
> Well no, you started the typo, I merely propagated it and forgot to  
> fix
> it up :-)
>
Actually, I copied it from your printk() ;-) (thinking helps...)

>>>>> Sure. You might want to include NFS file access into your tests,
>>>>> since we've not triggered this with locally accessing the disks.
>>>>> BTW:
>>>>
>>>> How are you exporting the directory (what exports options) - how
>>>> is it
>>>> mounted by the client(s)? What chunksize is your raid6 using?
>>>
>>> And what are the nature of the files on the raid (huge, small, ?)  
>>> and
>>> what are the client(s) doing? Just approximately, I know these  
>>> things
>>> can be hard/difficult/impossible to specify.
>>>
>> The files are 100-400MB in size and the client is merging them into a
>> new file in the same directory using the ROOT library, which does in
>> essence alternating sequences of
>>
>> _llseek(somewhere)
>> read(n bytes)
>> _llseek(somewhere+n)
>> read(m bytes)
>> ...
>>
>> and then
>>
>> _llseek(somewhere)
>> rt_sigaction(ignore INT)
>> write(n bytes)
>> rt_sigaction(INT->DFL)
>> time()
>> _llseek(somewhere+n)
>> ...
>>
>> where n is of the the order of 30kB. The input files are treated
>> sequentially, not randomly.
>
> Ok, I'll see if I can reproduce it. No luck so far, I'm afraid.
>
Too bad.

>> BTW: the machine just stopped dead, no sign whatsoever on console or
>> netconsole, so I rebooted with elevator=deadline
>> (need to get some work done besides ;-) )
>
> Unfortunately expected, if we can race and lose an update to - 
> >next_rq,
> we can race and corrupt some of the internal data structures as  
> well. If
> you have the time and inclination, it would be interesting to see  
> if you
> can reproduce with some debugging options enabled:
>
> - Enable all preempt, spinlock and lockdep debugging measures
> - Possibly slab poisoning, although that may slow you down somewhat
>
Kernel compilation under way, will report back.

> Are you using 4kb stacks?
>
No idea, 'grep -i stack .config' gives no indication, but ISTR that  
4k was made the default some time back?

Ciao,
                     Roland

--
TU Muenchen, Physik-Department E18, James-Franck-Str., 85748 Garching
Telefon 089/289-12575; Telefax 089/289-12570
--
CERN office: 892-1-D23 phone: +41 22 7676540 mobile: +41 76 487 4482
--
Any society that would give up a little liberty to gain a little
security will deserve neither and lose both.  - Benjamin Franklin
-----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GS/CS/M/MU d-(++) s:+ a-> C+++ UL++++ P+++ L+++ E(+) W+ !N K- w--- M 
+ !V Y+
PGP++ t+(++) 5 R+ tv-- b+ DI++ e+++>++++ h---- y+++
------END GEEK CODE BLOCK------



Download attachment "smime.p7s" of type "application/pkcs7-signature" (4324 bytes)

Download attachment "PGP.sig" of type "application/pgp-signature" (187 bytes)