lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Sun, 3 May 2009 11:26:52 +0400
From:	Andrey Borzenkov <arvidjaar@...l.ru>
To:	dm-devel@...hat.com, linux-ide@...r.kernel.org
Cc:	linux-kernel@...r.kernel.org
Subject: dm-multipath completely breaks AHCI driver

Yeah, me knows - it does sound weird.

I moved my system from to different notebook (it was pure rsync + 
mkinitrd + couple of adjustments in loaded modules). Older one was using 
pata_ali, new one is using AHCI (Intel ICH8).

After booting it (using *the* *same* kernel, only different modules) for 
the first time I started to get random oopses, file system corruptions 
as well as scary HSM violations in logs.

Ubuntu live CD had no issues (even with relatively high IO load during 
kernel compilation); also my son used this notebook for several months 
under ArchLinux without any issues either.

I compiled minimal stripped down kernel which worked. I tried to compile 
more feature reach one - and immediately got HSM violations. This time I 
noticed that "bad" one had dm-multipath and "good" one not ... well, 
removing /etc/mutipath.conf (to block loading of it) - and lo and 
behold! I am running without any issues ...

So what I got in logs was:

[   18.593217] device-mapper: uevent: version 1.0.3
[   18.593873] device-mapper: ioctl: 4.14.0-ioctl (2008-04-23) 
initialised: dm-devel@...hat.com
[   18.664812] device-mapper: multipath: version 1.0.5 loaded
[   19.070383] ata1.00: exception Emask 0x2 SAct 0x73 SErr 0x3000400 
action 0x6
[   19.070389] ata1.00: irq_stat 0x45000008
[   19.070395] ata1: SError: { Proto TrStaTrns UnrecFIS }
[   19.070406] ata1.00: cmd 60/08:00:e2:e5:c3/00:00:01:00:00/40 tag 0 
ncq 4096 in
[   19.070409]          res 40/00:38:00:00:00/00:00:00:00:00/40 Emask 
0x2 (HSM violation)
[   19.070414] ata1.00: status: { DRDY }
[   19.070424] ata1.00: cmd 60/40:08:12:60:40/00:00:01:00:00/40 tag 1 
ncq 32768 in
[   19.070426]          res 40/00:38:00:00:00/00:00:00:00:00/40 Emask 
0x2 (HSM violation)
[   19.070431] ata1.00: status: { DRDY }
[   19.070442] ata1.00: cmd 60/18:20:42:c1:c4/00:00:01:00:00/40 tag 4 
ncq 12288 in
[   19.070444]          res 41/84:18:59:c1:c4/03:00:01:00:00/61 Emask 
0x412 (ATA bus error) <F>
[   19.070453] ata1.00: status: { DRDY ERR }
[   19.070455] ata1.00: error: { ICRC ABRT }
[   19.070460] ata1.00: cmd 60/28:28:22:c3:c4/00:00:01:00:00/40 tag 5 
ncq 20480 in
[   19.070461]          res 40/00:38:00:00:00/00:00:00:00:00/40 Emask 
0x2 (HSM violation)
[   19.070464] ata1.00: status: { DRDY }
[   19.070469] ata1.00: cmd 60/80:30:c2:c3:c4/02:00:01:00:00/40 tag 6 
ncq 327680 in
[   19.070470]          res 40/00:38:00:00:00/00:00:00:00:00/40 Emask 
0x2 (HSM violation)
[   19.070473] ata1.00: status: { DRDY }
[   19.070479] ata1: hard resetting link
[   19.420110] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[   19.485363] ata1.00: configured for UDMA/100
[   19.485406] ata1: EH complete
[   19.586193] device-mapper: multipath round-robin: version 1.0.0 
loaded
[   19.586634] device-mapper: table: 254:0: multipath: error getting 
device
[   19.586638] device-mapper: ioctl: error adding target to table
[   31.932088] REISERFS (device sda7): found reiserfs format "3.6" with 
standard journal
[   31.932111] REISERFS (device sda7): using ordered data mode
[   31.932289] REISERFS (device sda7): journal params: device sda7, size 
8192, journal first block 18, max trans len 1024, max batch 900, max 
commit age 30, max trans age 30
[   31.932559] REISERFS (device sda7): checking transaction log (sda7)
[   31.999527] REISERFS (device sda7): Using r5 hash to sort names
[   32.101614] loop: module loaded
[   32.581870] Adding 4192924k swap on /dev/sda5.  Priority:-1 extents:1 
across:4192924k 
[   34.671257] iwl3945 0000:0c:00.0: firmware: requesting 
iwlwifi-3945-2.ucode
[   34.924834] iwl3945 0000:0c:00.0: loaded firmware version 15.28.2.8
[   34.941614] BUG: unable to handle kernel paging request at 08048154
[   34.949359] IP: [<c0284711>] kmem_cache_alloc+0xa8/0xc1
[   34.949359] *pde = 7f4b8067 
[   34.949359] Oops: 0003 [#1] SMP 
[   34.949359] last sysfs file: /sys/devices/virtual/dmi/id/board_serial
[   34.949359] Modules linked in: loop ext2 mbcache dm_mirror 
dm_region_hash dm_log dm_round_robin dm_multipath dm_mod 
snd_hda_codec_idt iwl3945 snd_hda_intel snd_hda_codec iwlcore snd_hwdep 
btusb uvcvideo bluetooth snd_pcm videodev sdhci_pci ohci1394 
snd_page_alloc sdhci v4l1_compat iTCO_wdt ieee1394 lib80211 mmc_core 
dcdbas ricoh_mmc hid ata_piix ahci libata reiserfs
[   34.949359] 
[   34.949359] Pid: 2151, comm: hald Tainted: G        W  (2.6.30-
rc4-3avb #5) XPS M1330                       
[   34.949359] EIP: 0060:[<c0284711>] EFLAGS: 00010246 CPU: 1
[   34.949359] EIP is at kmem_cache_alloc+0xa8/0xc1
[   34.949359] EAX: 00000000 EBX: 08048154 ECX: 00000025 EDX: 00000094
[   34.949359] ESI: 00008050 EDI: 08048154 EBP: f5a26f2c ESP: f5a26f14
[   34.949359]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[   34.949359] Process hald (pid: 2151, ti=f5a26000 task=f5aabe30 
task.ti=f5a26000)
[   34.949359] Stack:
[   34.949359]  f7001000 c02eb6d5 00000094 f6601260 f5a21d80 00008050 
f5a26f44 c02eb6d5
[   34.949359]  f5a21d90 f5a21d80 f71e4dec f71e4e08 f5a26f58 c02a9b43 
f6f71eec f71e4dec
[   34.949359]  f5a21d80 f5a26f7c c02a9c6c f6f72100 f5a21db4 ffffffea 
c02aa7d6 fffffff4
[   34.949359] Call Trace:
[   34.949359]  [<c02eb6d5>] ? idr_pre_get+0x27/0x60
[   34.949359]  [<c02eb6d5>] ? idr_pre_get+0x27/0x60
[   34.949359]  [<c02a9b43>] ? inotify_handle_get_wd+0x19/0x54
[   34.949359]  [<c02a9c6c>] ? inotify_add_watch+0x4d/0xe1
[   34.949359]  [<c02aa7d6>] ? sys_inotify_add_watch+0xbb/0x16c
[   34.949359]  [<c02aa815>] ? sys_inotify_add_watch+0xfa/0x16c
[   34.949359]  [<c02029d4>] ? sysenter_do_call+0x12/0x32
[   34.949359] Code: 02 00 00 75 09 57 9d e8 60 ca fb ff eb 07 e8 d1 de 
fb ff 57 9d 66 85 f6 79 20 85 db 74 1c 8b 4d f0 89 df c1 e9 02 31 c0 8b 
55 f0 <f3> ab f6 c2 02 74 02 66 ab f6 c2 01 74 01 aa 8d 65 f4 89 d8 5b 
[   34.949359] EIP: [<c0284711>] kmem_cache_alloc+0xa8/0xc1 SS:ESP 
0068:f5a26f14
[   34.949359] CR2: 0000000008048154
[   35.262159] ---[ end trace a499c68e391b7f63 ]---
[   35.272994] Registered led device: iwl-phy0::radio
[   35.282572] Registered led device: iwl-phy0::assoc
[   35.292096] Registered led device: iwl-phy0::RX
[   35.302859] Registered led device: iwl-phy0::TX
[   35.317633] BUG: unable to handle kernel paging request at 62696c2f
[   35.327346] IP: [<c02846c7>] kmem_cache_alloc+0x5e/0xc1
[   35.327561] *pde = 00000000 
[   35.327561] Oops: 0000 [#2] SMP 
[   35.327561] last sysfs file: /sys/devices/virtual/dmi/id/board_serial
[   35.327561] Modules linked in: loop ext2 mbcache dm_mirror 
dm_region_hash dm_log dm_round_robin dm_multipath dm_mod 
snd_hda_codec_idt iwl3945 snd_hda_intel snd_hda_codec iwlcore snd_hwdep 
btusb uvcvideo bluetooth snd_pcm videodev sdhci_pci ohci1394 
snd_page_alloc sdhci v4l1_compat iTCO_wdt ieee1394 lib80211 mmc_core 
dcdbas ricoh_mmc hid ata_piix ahci libata reiserfs
[   35.327561] 
[   35.327561] Pid: 2267, comm: console-kit-dae Tainted: G      D W  
(2.6.30-rc4-3avb #5) XPS M1330                       
[   35.327561] EIP: 0060:[<c02846c7>] EFLAGS: 00010002 CPU: 1
[   35.327561] EIP is at kmem_cache_alloc+0x5e/0xc1
[   35.327561] EAX: 00000000 EBX: 62696c2f ECX: f7001000 EDX: c22b34bc
[   35.327561] ESI: 00008050 EDI: 00000246 EBP: f5a5cf2c ESP: f5a5cf14
[   35.327561]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[   35.327561] Process console-kit-dae (pid: 2267, ti=f5a5c000 
task=f59fa550 task.ti=f5a5c000)
[   35.327561] Stack:
[   35.327561]  f7001000 c02eb6d5 00000094 f67fe0c0 f67fe0c0 00008050 
f5a5cf44 c02eb6d5
[   35.327561]  f67fe0d0 f67fe0c0 f71e4cd4 f71e4cf0 f5a5cf58 c02a9b43 
f6f7259c f71e4cd4
[   35.327561]  f67fe0c0 f5a5cf7c c02a9c6c f6f727b0 f67fe0f4 ffffffea 
c02aa7d6 fffffff4
[   35.327561] Call Trace:
[   35.327561]  [<c02eb6d5>] ? idr_pre_get+0x27/0x60
[   35.327561]  [<c02eb6d5>] ? idr_pre_get+0x27/0x60
[   35.327561]  [<c02a9b43>] ? inotify_handle_get_wd+0x19/0x54
[   35.327561]  [<c02a9c6c>] ? inotify_add_watch+0x4d/0xe1
[   35.327561]  [<c02aa7d6>] ? sys_inotify_add_watch+0xbb/0x16c
[   35.327561]  [<c02aa815>] ? sys_inotify_add_watch+0xfa/0x16c
[   35.327561]  [<c02029d4>] ? sysenter_do_call+0x12/0x32
[   35.327561] Code: 62 f5 21 00 9c 5f fa e8 aa ca fb ff 8b 4d e8 64 a1 
88 d4 66 c0 8b 94 81 a4 00 00 00 8b 42 10 89 45 f0 8b 1a 85 db 74 0a 8b 
42 0c <8b> 04 83 89 02 eb 15 52 83 c9 ff 89 f2 ff 75 ec 8b 45 e8 e8 4d 
[   35.327561] EIP: [<c02846c7>] kmem_cache_alloc+0x5e/0xc1 SS:ESP 
0068:f5a5cf14
[   35.327561] CR2: 0000000062696c2f
[   35.327561] ---[ end trace a499c68e391b7f64 ]---
[   35.693405] ADDRCONF(NETDEV_UP): wlan0: link is not ready

The contents of /etc/mutipath.conf was:

defaults {
	getuid_callout	"/bin/echo -n 12345678"
	path_checker	readsector0
}
multipaths {
	multipath {
		wwid		12345678
		features	"1 queue_if_no_path"
		no_path_retry	5
		failback	immediate
	}
}

I was experimenting with multipath-over-loop and completely forgot about 
it later.

I understand that it sounds more like "doctor, it hurts when I stab 
myself in the eye". Still, even with this absolutely weird config result 
is rather unexpected (what is worse, it is near to impossible to find 
out the reason; I did it by pure accident). Also I have been running 
with *exactly* the same config for almost a year without any ill effects 
at all.

I am ready to offer debugging aid if required.

Download attachment "signature.asc " of type "application/pgp-signature" (198 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ