linux-kernel - Re: kernel BUG at drivers/scsi/aic7xxx/aic79xx

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <47D3C93D.8070204@msgid.tls.msk.ru>
Date:	Sun, 09 Mar 2008 14:25:49 +0300
From:	Michael Tokarev <mjt@....msk.ru>
To:	Linux-kernel <linux-kernel@...r.kernel.org>,
	SCSI Mailing List <linux-scsi@...r.kernel.org>
Subject: Re: kernel BUG at drivers/scsi/aic7xxx/aic79xx_osm.c:1490!

Michael Tokarev wrote:
> Just got quite.. bad situation on a production server
> here.  The machine locked up hard several times in a
> row (required hard reboot).  So I finally enabled watchdog
> subsystem which helped.
> 
> Now I see the following (over netconsole):

Forgot the most important information.

# uname -a
Linux tbus90.msk.rgs-podm.ru 2.6.24-x86-64 #2.6.24.2 SMP Mon Feb 18 16:04:41 MSK 2008 x86_64 GNU/Linux

It's mostly vanilla 2.6.24.2, with some irrelevant patches like unionfs
(not even loaded).


> DMA: Out of SW-IOMMU space for 65536 bytes at device 0000:08:07.0
> ------------[ cut here ]------------
> kernel BUG at drivers/scsi/aic7xxx/aic79xx_osm.c:1490!
> invalid opcode: 0000 [1] SMP
> CPU 0
> Modules linked in: xfs netconsole nfsd lockd nfs_acl sunrpc exportfs 
> autofs4 iTCO_wdt iTCO_vendor_support raid10 raid0 sr_mod cdrom ata_piix 
> libata tg3 mptspi mptscsih mptbase ext3 jbd mbcache raid1 md_mod sd_mod 
> aic79xx scsi_transport_spi scsi_mod
> Pid: 2176, comm: gzip Not tainted 2.6.24-x86-64 #2.6.24.2
> RIP: 0010:[<ffffffff8805053a>]  [<ffffffff8805053a>] 
> :aic79xx:ahd_linux_queue+0x58a/0x590
> RSP: 0000:ffffffff80511d40  EFLAGS: 00010082
> RAX: 00000000fffffff4 RBX: ffff81018c331600 RCX: 00000000fffffff4
> RDX: ffff8100063660e0 RSI: 0000000000000002 RDI: ffffffff804a2150
> RBP: ffff8101a9029e40 R08: 0000000000000044 R09: 0000000000000000
> R10: 00000000fffffff4 R11: ffffffff80222d80 R12: ffff8101aff8d418
> R13: ffff8101aeea7000 R14: ffff8101aef50000 R15: ffff8101aeea78b4
> FS:  0000000000000000(0000) GS:ffffffff804b7000(0063) 
> knlGS:00000000f7de56b0
> CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
> CR2: 0000000008065000 CR3: 00000001adbb8000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process gzip (pid: 2176, threadinfo ffff8101a9270000, task 
> ffff8101a91b2000)
> Stack:  ffff8101aff8d000 0000000000000083 0000000000000220 ffffffff80245435
>  ffff81014ec656c0 0000000000000293 ffff8101aff8d000 ffff81018c331600
>  ffff8101aef48800 ffff81018c331600 ffff8101aff8d048 ffffffff8800100c
> Call Trace:
>  <IRQ>  [<ffffffff80245435>] __mod_timer+0xb5/0xd0
>  [<ffffffff8800100c>] :scsi_mod:scsi_dispatch_cmd+0x17c/0x2e0
>  [<ffffffff88007db5>] :scsi_mod:scsi_request_fn+0x225/0x3d0
>  [<ffffffff802ee723>] blk_run_queue+0x43/0x80
>  [<ffffffff880063fb>] :scsi_mod:scsi_next_command+0x3b/0x60
>  [<ffffffff880065e5>] :scsi_mod:scsi_end_request+0xd5/0x110
>  [<ffffffff8800694e>] :scsi_mod:scsi_io_completion+0xae/0x3e0
>  [<ffffffff802eea89>] blk_done_softirq+0x69/0x80
>  [<ffffffff802415d5>] __do_softirq+0x75/0xe0
>  [<ffffffff8020ce3c>] call_softirq+0x1c/0x30
>  [<ffffffff8020efd5>] do_softirq+0x35/0x90
>  [<ffffffff80241558>] irq_exit+0x88/0x90
>  [<ffffffff8020f220>] do_IRQ+0x80/0x100
>  [<ffffffff8020c1c1>] ret_from_intr+0x0/0xa
>  <EOI>
> 
> Code: 0f 0b eb fe 66 90 48 83 ec 78 4c 89 64 24 58 4c 89 74 24 68
> RIP  [<ffffffff8805053a>] :aic79xx:ahd_linux_queue+0x58a/0x590
>  RSP <ffffffff80511d40>
> Kernel panic - not syncing: Fatal exception
> 
> 
> The hardware is an IBM xSeries 346 [8840ECY] machine, with
> 2x dualcore CPUs and 6Gb Ram.  It has 2 SCSI controllers -
> one onboard 2-channel AIC-7902B, and one LSI Logic 53c1030 PCI-X
> Fusion-MPT Dual Ultra320.  Total 16 drives are attached to the
> 2 controllers.
> 
> There's a linux software raid10 array running over 14 drives
> (7 drives on each controller), and an XFS filesystem on top of
> it (410Gb).
> 
> The problem (the above oops) happens almost immediately after
> I'm trying to gzip some file on that filesystem - the system
> dies within one minute of running gzip.  The same happens when
> I try to copy those files over NFS - the same instant lockup,
> but happens later than with gzip.
> 
> Please help!....  This is a critical piece of hardware.
> 
> Thanks!
> 
> /mjt

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/