lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Tue, 5 Feb 2008 13:17:36 -0800
From:	Robin Lee Powell <rlpowell@...italkingdom.org>
To:	Neil Brown <neilb@...e.de>
Cc:	Nick Piggin <nickpiggin@...oo.com.au>, linux-kernel@...r.kernel.org
Subject: Re: Monthly md check == hung machine; how do I debug?

On Wed, Feb 06, 2008 at 07:27:56AM +1100, Neil Brown wrote:
> On Tuesday February 5, rlpowell@...italkingdom.org wrote:
> > 
> > I was able to solve the problem, however, like so:
> > 
> > 132c133
> > < # CONFIG_PREEMPT_NONE is not set
> > ---
> > > CONFIG_PREEMPT_NONE=y
> > 134,135c135,136
> > < CONFIG_PREEMPT=y
> > < CONFIG_PREEMPT_BKL=y
> > ---
> > > # CONFIG_PREEMPT is not set
> > > # CONFIG_PREEMPT_BKL is not set
> > 
> 
> This suggests that there is some sort of race. Given that I've
> never hit it on SMP machines, it is probably a very small window
> that opens immediately after some event that triggers kernel
> preemption.
> 
> The only "mdadm --monitor" does

Going to stop you right there; "mdadm --monitor" wasn't it, nor was
smartd as I thought at one point.  I honestly don't know what was
triggering it, except maybe disk access.  The fact that backups were
running at the same time as the sync seemed to make it happen
faster; that's the best I've got at this point.

> What sort of hardware do you have?  x86?  SMP or uni-processor?
> Also, exactly what kernel are you running?

rlpowell@...in> uname -a                                                                       
Linux chain.digitalkingdom.org 2.6.23.1-dk3 #4 SMP Mon Feb 4 06:14:44 PST 2008 x86_64 GNU/Linux
rlpowell@...in> cat /proc/cpuinfo                                                              
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 39
model name      : AMD Athlon(tm) 64 Processor 3700+
stepping        : 1
cpu MHz         : 2210.251
cache size      : 1024 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflu
t fxsr_opt lm 3dnowext 3dnow up rep_good pni lahf_lm
bogomips        : 4422.66
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc


> I might see if I can reproduce it... so if you can send me the
> broken .config, that might help too.

http://teddyb.org/~rlpowell/media/regular/config-2.6.23.1-dk2.txt

-Robin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ