lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 17 Jun 2010 13:21:36 -0700 (PDT)
From:	Sage Weil <sage@...dream.net>
To:	Andrew Morton <akpm@...ux-foundation.org>,
	Alexey Dobriyan <adobriyan@...il.com>,
	Tejun Heo <tj@...nel.org>, linux-kernel@...r.kernel.org,
	hch@....de, bfields@...i.umich.edu
Subject: weird umem vs nfsd regression

Hi,

I started seeing a hang during bootup of a machine with a umem 
(micromemory nvram) card.  I bisected it and narrowed it down to commit 
b95a5680 which merged a couple nfsd changes, although strangely reverting 
just b160fdab ('nfsd: nfsd_setattr needs to call commit_metadata') is 
sufficient to make the problem go away.

I'm not quite sure what to make of it.  I don't see how the nfsd change 
would affect the umem driver initialization.  The machine _is_ netbooting 
(kernel via PXE, nfs root), though.

When it hangs, I see

[    1.069553] v2.3 : Micro Memory(tm) PCI memory board block driver
[    1.075780] umem 0000:02:01.0: can't find IRQ for PCI INT C; probably buggy MP table
[    1.083633] umem 0000:02:01.0: Micro Memory(tm) controller found (PCI Mem Module (Battery Backup))
[    1.092762] umem 0000:02:01.0: CSR 0xfc9ffc00 -> 0xffffc90001466c00 (0x100)
[    1.099880] umem 0000:02:01.0: Size 1048576 KB, Battery 1 Disabled (FAILURE), Battery 2 Disabled (FAILURE)
[    1.109745] umem 0000:02:01.0: Window size 16777216 bytes, IRQ 9
[    1.115842] umem 0000:02:01.0: memory NOT initialized. Consider over-writing whole device.
[    1.125778]  umema:
[  240.886560] INFO: task swapper:1 blocked for more than 120 seconds.
[  240.893186] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  240.901115] swapper       D 0000000000000001     0     1      0 0x00000000
[  240.908231]  ffff8800f8da1b50 0000000000000046 ffff8800f8da1ac0 ffff8800f8da1fd8
[  240.916027]  ffff8800f8d80050 0000000000004000 0000000000004000 00000000001d2180
[  240.923819]  00000000001d2180 ffff8800f8da1fd8 ffff8800f8da1fd8 00000000001d2180
[  240.931611] Call Trace:
[  240.934133]  [<ffffffff8144f96d>] ? _raw_spin_unlock_irqrestore+0x4c/0x68
[  240.941011]  [<ffffffff812a99ab>] ? mm_unplug_device+0x47/0x50
[  240.946929]  [<ffffffff8144cb84>] io_schedule+0x38/0x4d
[  240.952238]  [<ffffffff810802d0>] sync_page+0x4c/0x50
[  240.957370]  [<ffffffff8144cf9b>] __wait_on_bit_lock+0x42/0x8b
[  240.963288]  [<ffffffff81080284>] ? sync_page+0x0/0x50
[  240.968508]  [<ffffffff81080270>] __lock_page+0x64/0x6b
[  240.973821]  [<ffffffff8104b500>] ? wake_bit_function+0x0/0x2a
[  240.979741]  [<ffffffff81080796>] do_read_cache_page+0xd3/0x135
[  240.985745]  [<ffffffff810da132>] ? blkdev_readpage+0x0/0x15
[  240.991492]  [<ffffffff81080834>] read_cache_page_async+0x17/0x19
[  240.997671]  [<ffffffff8108083f>] read_cache_page+0x9/0x13
[  241.003244]  [<ffffffff81102a7b>] read_dev_sector+0x2e/0x93
[  241.008903]  [<ffffffff8110397c>] ? adfspart_check_ICS+0x0/0x19c
[  241.014997]  [<ffffffff811039b6>] adfspart_check_ICS+0x3a/0x19c
[  241.021001]  [<ffffffff81432011>] ? kmemleak_alloc+0x5b/0xa2
[  241.026748]  [<ffffffff810ab6d9>] ? kmem_cache_alloc+0x14f/0x17d
[  241.032842]  [<ffffffff8110397c>] ? adfspart_check_ICS+0x0/0x19c
[  241.038934]  [<ffffffff81103778>] rescan_partitions+0x196/0x39a
[  241.044940]  [<ffffffff8121a24e>] ? disk_get_part+0x74/0xbc
[  241.050600]  [<ffffffff810da63a>] __blkdev_get+0x271/0x365
[  241.056172]  [<ffffffff81222b14>] ? kobject_put+0x47/0x4c
[  241.061658]  [<ffffffff810da739>] blkdev_get+0xb/0xd
[  241.066708]  [<ffffffff81103121>] register_disk+0xbd/0x11f
[  241.072274]  [<ffffffff8121a3be>] add_disk+0xb8/0x116
[  241.077414]  [<ffffffff8193abbf>] mm_init+0x126/0x19e
[  241.082556]  [<ffffffff8193aa99>] ? mm_init+0x0/0x19e
[  241.087693]  [<ffffffff810001f0>] do_one_initcall+0x5a/0x14f
[  241.093440]  [<ffffffff81914908>] kernel_init+0x148/0x1d2
[  241.098925]  [<ffffffff81003794>] kernel_thread_helper+0x4/0x10
[  241.104933]  [<ffffffff8102cd00>] ? can_nice+0x19/0x3a
[  241.110157]  [<ffffffff8144fcc0>] ? restore_args+0x0/0x30
[  241.115644]  [<ffffffff819147c0>] ? kernel_init+0x0/0x1d2
[  241.121128]  [<ffffffff81003790>] ? kernel_thread_helper+0x0/0x10
[  241.127300] 1 lock held by swapper/1:
[  241.131046]  #0:  (&bdev->bd_mutex){+.+.+.}, at: [<ffffffff810da428>] __blkdev_get+0x5f/0x365

and with the above commit reverted, I get a 'normal' umem driver init (the 
umem errors/warnings are normal.. the batteries aren't connected and the 
card isn't being used):

Jun 17 12:49:58 ceph3 kernel: [    2.270121] v2.3 : Micro Memory(tm) PCI memory board block driver
Jun 17 12:49:58 ceph3 kernel: [    2.276360] umem 0000:02:01.0: PCI INT A -> GSI 20 (level, low) -> IRQ 20
Jun 17 12:49:58 ceph3 kernel: [    2.283237] umem 0000:02:01.0: Micro Memory(tm) controller found (PCI Mem Module (Battery Backup))
Jun 17 12:49:58 ceph3 kernel: [    2.292366] umem 0000:02:01.0: CSR 0xfc9ffc00 -> 0xffffc90001478c00 (0x100)
Jun 17 12:49:58 ceph3 kernel: [    2.299476] umem 0000:02:01.0: Size 1048576 KB, Battery 1 Disabled (FAILURE), Battery 2 Disabled (FAILURE)
Jun 17 12:49:58 ceph3 kernel: [    2.309507] umem 0000:02:01.0: Window size 16777216 bytes, IRQ 20
Jun 17 12:49:58 ceph3 kernel: [    2.315691] umem 0000:02:01.0: memory NOT initialized. Consider over-writing whole device.
Jun 17 12:49:58 ceph3 kernel: [    2.325697]  umema:
Jun 17 12:49:58 ceph3 kernel: [    2.328102] umem 0000:02:01.0: DMAstat - ANY_ERR MBE_ERR CHAIN_COMPLETE DMA_COMPLETE 
Jun 17 12:49:58 ceph3 kernel: [    2.328239] umem 0000:02:01.0: Memory access error detected (err count 0)
Jun 17 12:49:58 ceph3 kernel: [    2.328239] umem 0000:02:01.0: Multi-bit EDC error
Jun 17 12:49:58 ceph3 kernel: [    2.328239] umem 0000:02:01.0: Fault Address 0x0000000ff8, Fault Data 0xfff7fffdffffffff
Jun 17 12:49:58 ceph3 kernel: [    2.328239] umem 0000:02:01.0: Fault Check 0x00, Fault Syndrome 0x00
Jun 17 12:49:58 ceph3 kernel: [    2.354756] umem 0000:02:01.0: I/O error on sector 0/4096
Jun 17 12:49:58 ceph3 kernel: [    2.360235] umem 0000:02:01.0: DMAstat - ANY_ERR MBE_ERR CHAIN_COMPLETE DMA_COMPLETE 
Jun 17 12:49:58 ceph3 kernel: [    2.360248] Buffer I/O error on device umema, logical block 0
Jun 17 12:49:58 ceph3 kernel: [    2.366169] umem 0000:02:01.0: DMAstat - ANY_ERR MBE_ERR CHAIN_COMPLETE DMA_COMPLETE 
Jun 17 12:49:58 ceph3 kernel: [    2.366175] umem 0000:02:01.0: Memory access error detected (err count 0)
Jun 17 12:49:58 ceph3 kernel: [    2.370075] umem 0000:02:01.0: Multi-bit EDC error
Jun 17 12:49:58 ceph3 kernel: [    2.370075] umem 0000:02:01.0: Fault Address 0x0000000ff8, Fault Data 0xfff7fffdffffffff
Jun 17 12:49:58 ceph3 kernel: [    2.370075] umem 0000:02:01.0: Fault Check 0x00, Fault Syndrome 0x00
Jun 17 12:49:58 ceph3 kernel: [    2.392548] umem 0000:02:01.0: I/O error on sector 0/4096
Jun 17 12:49:58 ceph3 kernel: [    2.398022] umem 0000:02:01.0: DMAstat - ANY_ERR MBE_ERR CHAIN_COMPLETE DMA_COMPLETE 
Jun 17 12:49:58 ceph3 kernel: [    2.398031] Buffer I/O error on device umema, logical block 0
Jun 17 12:49:58 ceph3 kernel: [    2.403926] umem 0000:02:01.0: DMAstat - ANY_ERR MBE_ERR CHAIN_COMPLETE DMA_COMPLETE 
Jun 17 12:49:58 ceph3 kernel: [    2.403933] umem 0000:02:01.0: Memory access error detected (err count 0)
Jun 17 12:49:58 ceph3 kernel: [    2.407853] umem 0000:02:01.0: Multi-bit EDC error
Jun 17 12:49:58 ceph3 kernel: [    2.407853] umem 0000:02:01.0: Fault Address 0x0000000ff8, Fault Data 0xfff7fffdffffffff
Jun 17 12:49:58 ceph3 kernel: [    2.407853] umem 0000:02:01.0: Fault Check 0x00, Fault Syndrome 0x00
Jun 17 12:49:58 ceph3 kernel: [    2.430297] umem 0000:02:01.0: I/O error on sector 0/4096
Jun 17 12:49:58 ceph3 kernel: [    2.435774] umem 0000:02:01.0: DMAstat - ANY_ERR MBE_ERR CHAIN_COMPLETE DMA_COMPLETE 
Jun 17 12:49:58 ceph3 kernel: [    2.435783] Buffer I/O error on device umema, logical block 0
Jun 17 12:49:58 ceph3 kernel: [    2.441677] umem 0000:02:01.0: DMAstat - ANY_ERR MBE_ERR CHAIN_COMPLETE DMA_COMPLETE 
Jun 17 12:49:58 ceph3 kernel: [    2.441684] umem 0000:02:01.0: Memory access error detected (err count 0)
Jun 17 12:49:58 ceph3 kernel: [    2.445605] umem 0000:02:01.0: Multi-bit EDC error
Jun 17 12:49:58 ceph3 kernel: [    2.445605] umem 0000:02:01.0: Fault Address 0x0000000ff8, Fault Data 0xfff7fffdffffffff
Jun 17 12:49:58 ceph3 kernel: [    2.445605] umem 0000:02:01.0: Fault Check 0x00, Fault Syndrome 0x00
Jun 17 12:49:58 ceph3 kernel: [    2.468037] umem 0000:02:01.0: I/O error on sector 0/4096
Jun 17 12:49:58 ceph3 kernel: [    2.473517] umem 0000:02:01.0: DMAstat - ANY_ERR MBE_ERR CHAIN_COMPLETE DMA_COMPLETE 
Jun 17 12:49:58 ceph3 kernel: [    2.473525] Buffer I/O error on device umema, logical block 0
Jun 17 12:49:58 ceph3 kernel: [    2.479499] umem 0000:02:01.0: DMAstat - ANY_ERR MBE_ERR CHAIN_COMPLETE DMA_COMPLETE 
Jun 17 12:49:58 ceph3 kernel: [    2.479506] umem 0000:02:01.0: Memory access error detected (err count 0)
Jun 17 12:49:58 ceph3 kernel: [    2.483350] umem 0000:02:01.0: Multi-bit EDC error
Jun 17 12:49:58 ceph3 kernel: [    2.483350] umem 0000:02:01.0: Fault Address 0x0000000ff8, Fault Data 0xfff7fffdffffffff
Jun 17 12:49:58 ceph3 kernel: [    2.483350] umem 0000:02:01.0: Fault Check 0x00, Fault Syndrome 0x00
Jun 17 12:49:58 ceph3 kernel: [    2.510375] umem 0000:02:01.0: I/O error on sector 0/4096
Jun 17 12:49:58 ceph3 kernel: [    2.515856] umem 0000:02:01.0: DMAstat - ANY_ERR MBE_ERR CHAIN_COMPLETE DMA_COMPLETE 
Jun 17 12:49:58 ceph3 kernel: [    2.515863] Buffer I/O error on device umema, logical block 0
Jun 17 12:49:58 ceph3 kernel: [    2.521769] ldm_validate_partition_table(): Disk read failed.
Jun 17 12:49:58 ceph3 kernel: [    2.527637] umem 0000:02:01.0: DMAstat - ANY_ERR MBE_ERR CHAIN_COMPLETE DMA_COMPLETE 
Jun 17 12:49:58 ceph3 kernel: [    2.527644] umem 0000:02:01.0: Memory access error detected (err count 0)
Jun 17 12:49:58 ceph3 kernel: [    2.531590] umem 0000:02:01.0: Multi-bit EDC error
Jun 17 12:49:58 ceph3 kernel: [    2.531590] umem 0000:02:01.0: Fault Address 0x0000000ff8, Fault Data 0xfff7fffdffffffff
Jun 17 12:49:58 ceph3 kernel: [    2.531590] umem 0000:02:01.0: Fault Check 0x00, Fault Syndrome 0x00
Jun 17 12:49:58 ceph3 kernel: [    2.553992] umem 0000:02:01.0: I/O error on sector 0/4096
Jun 17 12:49:58 ceph3 kernel: [    2.559466] umem 0000:02:01.0: DMAstat - ANY_ERR MBE_ERR CHAIN_COMPLETE DMA_COMPLETE 
Jun 17 12:49:58 ceph3 kernel: [    2.559473] Buffer I/O error on device umema, logical block 0
Jun 17 12:49:58 ceph3 kernel: [    2.565371] umem 0000:02:01.0: DMAstat - ANY_ERR MBE_ERR CHAIN_COMPLETE DMA_COMPLETE 
Jun 17 12:49:58 ceph3 kernel: [    2.565378] umem 0000:02:01.0: Memory access error detected (err count 0)
Jun 17 12:49:58 ceph3 kernel: [    2.569297] umem 0000:02:01.0: Multi-bit EDC error
Jun 17 12:49:58 ceph3 kernel: [    2.569297] umem 0000:02:01.0: Fault Address 0x0000000ff8, Fault Data 0xfff7fffdffffffff
Jun 17 12:49:58 ceph3 kernel: [    2.569297] umem 0000:02:01.0: Fault Check 0x00, Fault Syndrome 0x00
Jun 17 12:49:58 ceph3 kernel: [    2.591729] umem 0000:02:01.0: I/O error on sector 0/4096
Jun 17 12:49:58 ceph3 kernel: [    2.597212] umem 0000:02:01.0: DMAstat - ANY_ERR MBE_ERR CHAIN_COMPLETE DMA_COMPLETE 
Jun 17 12:49:58 ceph3 kernel: [    2.597220] Buffer I/O error on device umema, logical block 0
Jun 17 12:49:58 ceph3 kernel: [    2.603117] umem 0000:02:01.0: DMAstat - ANY_ERR MBE_ERR CHAIN_COMPLETE DMA_COMPLETE 
Jun 17 12:49:58 ceph3 kernel: [    2.603124] umem 0000:02:01.0: Memory access error detected (err count 0)
Jun 17 12:49:58 ceph3 kernel: [    2.607042] umem 0000:02:01.0: Multi-bit EDC error
Jun 17 12:49:58 ceph3 kernel: [    2.607042] umem 0000:02:01.0: Fault Address 0x0000000ff8, Fault Data 0xfff7fffdffffffff
Jun 17 12:49:58 ceph3 kernel: [    2.607042] umem 0000:02:01.0: Fault Check 0x00, Fault Syndrome 0x00
Jun 17 12:49:58 ceph3 kernel: [    2.629472] umem 0000:02:01.0: I/O error on sector 0/4096
Jun 17 12:49:58 ceph3 kernel: [    2.634955] umem 0000:02:01.0: DMAstat - ANY_ERR MBE_ERR CHAIN_COMPLETE DMA_COMPLETE 
Jun 17 12:49:58 ceph3 kernel: [    2.634962] Buffer I/O error on device umema, logical block 0
Jun 17 12:49:58 ceph3 kernel: [    2.640849] umem 0000:02:01.0: DMAstat - ANY_ERR MBE_ERR CHAIN_COMPLETE DMA_COMPLETE 
Jun 17 12:49:58 ceph3 kernel: [    2.640856] umem 0000:02:01.0: Memory access error detected (err count 0)
Jun 17 12:49:58 ceph3 kernel: [    2.644785] umem 0000:02:01.0: Multi-bit EDC error
Jun 17 12:49:58 ceph3 kernel: [    2.644785] umem 0000:02:01.0: Fault Check 0x00, Fault Syndrome 0x00
Jun 17 12:49:58 ceph3 kernel: [    2.672688] umem 0000:02:01.0: DMAstat - ANY_ERR MBE_ERR CHAIN_COMPLETE DMA_COMPLETE 
Jun 17 12:49:58 ceph3 kernel: [    2.683375] umem 0000:02:01.0: DMAstat - ANY_ERR MBE_ERR CHAIN_COMPLETE DMA_COMPLETE 
Jun 17 12:49:58 ceph3 kernel: [    2.687328] umem 0000:02:01.0: Fault Check 0x00, Fault Syndrome 0x00
Jun 17 12:49:58 ceph3 kernel: [    2.715281] umem 0000:02:01.0: DMAstat - ANY_ERR MBE_ERR CHAIN_COMPLETE DMA_COMPLETE 
Jun 17 12:49:58 ceph3 kernel: [    2.719213] umem 0000:02:01.0: Fault Address 0x0000000ff8, Fault Data 0xfff7fffdffffffff
Jun 17 12:49:58 ceph3 kernel: [    2.741630] umem 0000:02:01.0: I/O error on sector 0/4096
Jun 17 12:49:58 ceph3 kernel: [    2.747111] umem 0000:02:01.0: DMAstat - ANY_ERR MBE_ERR CHAIN_COMPLETE DMA_COMPLETE 
Jun 17 12:49:58 ceph3 kernel: [    2.747194] umem 0000:02:01.0: DMAstat - ANY_ERR MBE_ERR CHAIN_COMPLETE DMA_COMPLETE 
Jun 17 12:49:58 ceph3 kernel: [    2.747200] umem 0000:02:01.0: Memory access error detected (err count 0)
Jun 17 12:49:58 ceph3 kernel: [    2.751109] umem 0000:02:01.0: Multi-bit EDC error
Jun 17 12:49:58 ceph3 kernel: [    2.751109] umem 0000:02:01.0: Fault Address 0x0000003ff8, Fault Data 0xefdfffffffffffff
Jun 17 12:49:58 ceph3 kernel: [    2.773549] umem 0000:02:01.0: I/O error on sector 24/4096
Jun 17 12:49:58 ceph3 kernel: [    2.779118] umem 0000:02:01.0: DMAstat - ANY_ERR MBE_ERR CHAIN_COMPLETE DMA_COMPLETE 
Jun 17 12:49:58 ceph3 kernel: [    2.779182] umem 0000:02:01.0: DMAstat - ANY_ERR MBE_ERR CHAIN_COMPLETE DMA_COMPLETE 
Jun 17 12:49:58 ceph3 kernel: [    2.779189] umem 0000:02:01.0: Memory access error detected (err count 0)
Jun 17 12:49:58 ceph3 kernel: [    2.783116] umem 0000:02:01.0: Multi-bit EDC error
Jun 17 12:49:58 ceph3 kernel: [    2.783116] umem 0000:02:01.0: Fault Address 0x0000003ff8, Fault Data 0xefdfffffffffffff
Jun 17 12:49:58 ceph3 kernel: [    2.783116] umem 0000:02:01.0: Fault Check 0x00, Fault Syndrome 0x00
Jun 17 12:49:58 ceph3 kernel: [    2.805537] umem 0000:02:01.0: I/O error on sector 24/4096
Jun 17 12:49:58 ceph3 kernel: [    2.811098] umem 0000:02:01.0: DMAstat - ANY_ERR MBE_ERR CHAIN_COMPLETE DMA_COMPLETE 
Jun 17 12:49:58 ceph3 kernel: [    2.811163] umem 0000:02:01.0: DMAstat - ANY_ERR MBE_ERR CHAIN_COMPLETE DMA_COMPLETE 
Jun 17 12:49:58 ceph3 kernel: [    2.811170] umem 0000:02:01.0: Memory access error detected (err count 0)
Jun 17 12:49:58 ceph3 kernel: [    2.815096] umem 0000:02:01.0: Multi-bit EDC error
Jun 17 12:49:58 ceph3 kernel: [    2.815096] umem 0000:02:01.0: Fault Address 0x0000000ff8, Fault Data 0xfff7fffdffffffff
Jun 17 12:49:58 ceph3 kernel: [    2.815096] umem 0000:02:01.0: Fault Check 0x00, Fault Syndrome 0x00
Jun 17 12:49:58 ceph3 kernel: [    2.837518] umem 0000:02:01.0: I/O error on sector 0/4096
Jun 17 12:49:58 ceph3 kernel: [    2.843000] umem 0000:02:01.0: DMAstat - ANY_ERR MBE_ERR CHAIN_COMPLETE DMA_COMPLETE 
Jun 17 12:49:58 ceph3 kernel: [    2.843269]  unable to read partition table
Jun 17 12:49:58 ceph3 kernel: [    2.849039] MM: desc_per_page = 128

.config is attached.

sage
View attachment ".config" of type "TEXT/PLAIN" (60249 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ