lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <526AA96C.8040600@crc.id.au>
Date:	Sat, 26 Oct 2013 04:25:00 +1100
From:	Steven Haigh <netwiz@....id.au>
To:	linux-kernel@...r.kernel.org
Subject: aac_write: aac_fib_send failed with status: -12

Hi all,

Firstly, please CC me as I'm not subscribed to this list.

I seem to be getting some random filesystem corruption on an IBM server
that I use as a Xen Dom0.

*** Specs ***
Vendor: IBM
Version: -[GGE149AUS-1.19]-
Product Name: IBM System x3650 -[7979CBM]-
AAC0: kernel 5.2-0[17003] Jul 25 2011
AAC0: monitor 5.2-0[17003]
AAC0: bios 5.2-0[17003]
AAC0: serial 5AB49E0
scsi0 : ServeRAID
scsi 0:0:0:0: Direct-Access     ServeRA  Dom0_RAID6       V1.0 PQ: 0 ANSI: 2
scsi 0:1:0:0: Direct-Access     IBM-ESXS MAY2073RC        T107 PQ: 0 ANSI: 5
scsi 0:1:1:0: Direct-Access     IBM-ESXS MAY2073RC        T107 PQ: 0 ANSI: 5
scsi 0:1:2:0: Direct-Access     IBM-ESXS MBC2073RC        SC06 PQ: 0 ANSI: 5
scsi 0:1:3:0: Direct-Access     IBM-ESXS ST973402SS       B52B PQ: 0 ANSI: 5
scsi 0:1:4:0: Direct-Access     IBM-ESXS ST973402SS       B52B PQ: 0 ANSI: 5
scsi 0:1:5:0: Direct-Access     IBM-ESXS ST973402SS       B52B PQ: 0 ANSI: 5
scsi 0:1:6:0: Direct-Access     IBM-ESXS ST973402SS       B52B PQ: 0 ANSI: 5
scsi 0:1:7:0: Direct-Access     IBM-ESXS ST973402SS       B52B PQ: 0 ANSI: 5
scsi 0:3:0:0: Enclosure         IBM-ESXS VSC7160          1.07 PQ: 0 ANSI: 3

I'm currently running kernel 3.11.4 and before the filesystem corruption
seems to happen, I get a load of these:
aac_write: aac_fib_send failed with status: -12

While this is going on, random things seem to fail. Eventually, I'll
reboot the system and lots of tools will segfault - tracing it back
leads to libraries that seem to have been corrupted.

I can boot the system from rescue media, reinstall all the corrupted
libraries / binaries and the system runs fine again for another few
months before it happens again.

arcconf shows:
# arcconf getconfig 1
Controllers found: 1
----------------------------------------------------------------------
Controller information
----------------------------------------------------------------------
   Controller Status                        : Okay
   Channel description                      : SAS/SATA
   Controller Model                         : IBM ServeRAID 8k
   Controller Serial Number                 : 5AB49E0
   Physical Slot                            : 0
   Installed memory                         : 256 MB
   Copyback                                 : Disabled
   Data scrubbing                           : Enabled
   Defunct disk drive count                 : 0
   Logical drives/Offline/Critical          : 1/0/0
   --------------------------------------------------------
   Controller Version Information
   --------------------------------------------------------
   BIOS                                     : 5.2-0 (17003)
   Firmware                                 : 5.2-0 (17003)
   Driver                                   : 1.2-0 (30200)
   Boot Flash                               : 5.1-0 (17002)
   --------------------------------------------------------
   Controller Battery Information
   --------------------------------------------------------
   Status                                   : Okay
   Over temperature                         : No
   Capacity remaining                       : 100 percent
   Time remaining (at current draw)         : 3 days, 20 hours, 56 minutes
   --------------------------------------------------------
   Controller Vital Product Data
   --------------------------------------------------------
   VPD Assigned#                            : 39R8875
   EC Version#                              : J85096
   Controller FRU#                          : 25R8076
   Battery FRU#                             : 25R8088

----------------------------------------------------------------------
Logical drive information
----------------------------------------------------------------------
Logical drive number 1
   Logical drive name                       : Dom0_RAID6
   RAID level                               : 6
   Status of logical drive                  : Okay
   Size                                     : 419400 MB
   Read-cache mode                          : Enabled
   Write-cache mode                         : Enabled (write-back)
   Write-cache setting                      : Enabled (write-back)
   Partitioned                              : Yes
   Number of segments                       : 8
   Stripe-unit size                         : 256 KB
   Stripe order (Channel,Device)            : 0,0 0,1 0,2 0,3 0,4 0,5
0,6 0,7
   Defunct segments                         : No
   Defunct stripes                          : No

Does anyone have any thoughts on this?

-- 
Steven Haigh

Email: netwiz@....id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Fax: (03) 8338 0299


Download attachment "signature.asc" of type "application/pgp-signature" (835 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ