lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 01 Sep 2011 19:40:15 +0200
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	scameron@...rdog.cce.hp.com, Jon Mason <mason@...i.com>,
	Jesse Barnes <jbarnes@...tuousgeek.org>
Cc:	james.bottomley@...senpartnership.com, linux-scsi@...r.kernel.org,
	linux-kernel@...r.kernel.org, stephenmcameron@...il.com,
	thenzl@...hat.com, akpm@...ux-foundation.org,
	mikem@...rdog.cce.hp.com
Subject: Re: [BUG] scsi: hpsa: how to destroy your files

Le jeudi 01 septembre 2011 à 11:07 -0500, scameron@...rdog.cce.hp.com a
écrit :
> On Thu, Sep 01, 2011 at 05:24:02PM +0200, Eric Dumazet wrote:
> > Stephen,
> > 
> > Current linux-3.1-rc4+ is a total disaster on my BL460c G6
> 
> What kernel were you running successfully previously?
> 
> I saw similar on BL460cG7 on Friday with 3.1-rc4,
> but I'm not sure the problem is in the driver.  
> I installed rhel6.1, then put 3.1-rc4 on.  Turning off
> "Virtualization" in the kernel config seemed to help
> (allowed it to boot) and so I thought that must have
> been the source of the issue.  So, you might try that.
> 
> However, I rebooted that machine just now, and
> now I am getting the similar "hpsa 0000:0c:00.0: resetting device 0:0:0:0"
> message, so that's pretty weird.
> 
> The cmd_alloc failure, I didn't see, but I may have missed it
> (didn't have console directed to serial output.)
> 
> cmd_alloc failing is not generally expected, as we reserve enough
> commands that the upper layers should never exhaust them all (should
> honor hpsa's max request limit), so that's pretty weird that
> you're seeing that.
> 
> I am able to run 3.1-rc3 on rhel6 just fine on other systems (DL380g7,
> for example) and I don't think there are any hpsa changes between rc3
> and rc4.  (haven't tried rc4 on the dl380g7 yet).
> 
> So, I'm not sure what's going on with the BL460c yet, but I am
> aware of the problem and have already seen it.  I can't think of
> any driver changes lately which should be causing such
> changes in behavior.
> 
> -- steve
> 
> 

OK I found the bad commit,I got lucky... I lost some files but my
machine was able to complete the bisection. CC involved people

git bisect start
# bad: [9e79e3e9dd9672b37ac9412e9a926714306551fe] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc
git bisect bad 9e79e3e9dd9672b37ac9412e9a926714306551fe
# good: [322a8b034003c0d46d39af85bf24fee27b902f48] Linux 3.1-rc1
git bisect good 322a8b034003c0d46d39af85bf24fee27b902f48
# bad: [0c3bef612881ee6216a36952ffaabfc35b83545c] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
git bisect bad 0c3bef612881ee6216a36952ffaabfc35b83545c
# good: [8c70aac04e01a08b7eca204312946206d1c1baac] Merge branch 'staging-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging-2.6
git bisect good 8c70aac04e01a08b7eca204312946206d1c1baac
# good: [291b63c86aea8a571ddf913d41ab5156b8314dad] Merge branch 'drm-intel-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/keithp/linux-2.6
git bisect good 291b63c86aea8a571ddf913d41ab5156b8314dad
# good: [aa462abe8aaf2198d6aef97da20c874ac694a39f] mm: fix __page_to_pfn for a const struct page argument
git bisect good aa462abe8aaf2198d6aef97da20c874ac694a39f
# good: [5c80c71b9a0ec518b4b58d2a61de01a04f4a4453] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable
git bisect good 5c80c71b9a0ec518b4b58d2a61de01a04f4a4453
# good: [2c4ac99f983f1341b5962a16b5e8de6049bf10b5] Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev
git bisect good 2c4ac99f983f1341b5962a16b5e8de6049bf10b5
# bad: [0a2daa1cf35004f5adbf4138555cc5669abf3a3e] PCI: make cardbus-bridge resources optional
git bisect bad 0a2daa1cf35004f5adbf4138555cc5669abf3a3e
# bad: [be768912a49b10b68e96fbd8fa3cab0adfbd3091] PCI: honor child buses add_size in hot plug configuration
git bisect bad be768912a49b10b68e96fbd8fa3cab0adfbd3091
# bad: [b03e7495a862b028294f59fc87286d6d78ee7fa1] PCI: Set PCI-E Max Payload Size on fabric
git bisect bad b03e7495a862b028294f59fc87286d6d78ee7fa1
commit b03e7495a862b028294f59fc87286d6d78ee7fa1
Author: Jon Mason <mason@...i.com>
Date:   Wed Jul 20 15:20:54 2011 -0500

    PCI: Set PCI-E Max Payload Size on fabric
    
    On a given PCI-E fabric, each device, bridge, and root port can have a
    different PCI-E maximum payload size.  There is a sizable performance
    boost for having the largest possible maximum payload size on each PCI-E
    device.  However, if improperly configured, fatal bus errors can occur.
    Thus, it is important to ensure that PCI-E payloads sends by a device
    are never larger than the MPS setting of all devices on the way to the
    destination.
    
    This can be achieved two ways:
    
    - A conservative approach is to use the smallest common denominator of
      the entire tree below a root complex for every device on that fabric.
    
    This means for example that having a 128 bytes MPS USB controller on one
    leg of a switch will dramatically reduce performances of a video card or
    10GE adapter on another leg of that same switch.
    
    It also means that any hierarchy supporting hotplug slots (including
    expresscard or thunderbolt I suppose, dbl check that) will have to be
    entirely clamped to 128 bytes since we cannot predict what will be
    plugged into those slots, and we cannot change the MPS on a "live"
    system.
    
    - A more optimal way is possible, if it falls within a couple of
      constraints:
    * The top-level host bridge will never generate packets larger than the
      smallest TLP (or if it can be controlled independently from its MPS at
      least)
    * The device will never generate packets larger than MPS (which can be
      configured via MRRS)
    * No support of direct PCI-E <-> PCI-E transfers between devices without
      some additional code to specifically deal with that case
    
    Then we can use an approach that basically ignores downstream requests
    and focuses exclusively on upstream requests. In that case, all we need
    to care about is that a device MPS is no larger than its parent MPS,
    which allows us to keep all switches/bridges to the max MPS supported by
    their parent and eventually the PHB.
    
    In this case, your USB controller would no longer "starve" your 10GE
    Ethernet and your hotplug slots won't affect your global MPS.
    Additionally, the hotplugged devices themselves can be configured to a
    larger MPS up to the value configured in the hotplug bridge.
    
    To choose between the two available options, two PCI kernel boot args
    have been added to the PCI calls.  "pcie_bus_safe" will provide the
    former behavior, while "pcie_bus_perf" will perform the latter behavior.
    By default, the latter behavior is used.
    
    NOTE: due to the location of the enablement, each arch will need to add
    calls to this function.  This patch only enables x86.
    
    This patch includes a number of changes recommended by Benjamin
    Herrenschmidt.
    
    Tested-by: Jordan_Hargrave@...l.com
    Signed-off-by: Jon Mason <mason@...i.com>
    Signed-off-by: Jesse Barnes <jbarnes@...tuousgeek.org>



> > 
> > 
> > Few seconds after boot, I get "cmd_alloc returned NULL" messages
> > or "hpsa 0000:0c:00.0: resetting device 0:0:0:0"
> > 
> > Usually lot of files are corrupted, fsck needed, and full distro
> > reinstall as well.
> > 
> > I tested on two different machines, same result.
> > 
> > Relevant hardware information :
> > 
> > 	Manufacturer: HP
> > 	Product Name: ProLiant BL460c G6
> > 	Version: I24
> > 	Release Date: 05/05/2011
> > 	Intel(R) Xeon(R) CPU E5540 @ 2.53GHz  (two sockets)
> > 
> > 0c:00.0 RAID bus controller: Hewlett-Packard Company Smart Array G6
> > controllers (rev 01)
> > 	Subsystem: Hewlett-Packard Company Smart Array P410i
> > 	Flags: bus master, fast devsel, latency 0, IRQ 16
> > 	Memory at fbc00000 (64-bit, non-prefetchable) [size=4M]
> > 	Memory at fbbf0000 (64-bit, non-prefetchable) [size=4K]
> > 	I/O ports at 4000 [size=256]
> > 	[virtual] Expansion ROM at e7200000 [disabled] [size=512K]
> > 	Capabilities: [40] Power Management version 3
> > 	Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
> > 	Capabilities: [70] Express Endpoint, MSI 00
> > 	Capabilities: [ac] MSI-X: Enable+ Count=16 Masked-
> > 	Capabilities: [100] Advanced Error Reporting
> > 	Kernel driver in use: hpsa
> > 
> > # hpacucli ctrl all show config detail
> > 
> > Smart Array P410i in Slot 0 (Embedded)
> >    Bus Interface: PCI
> >    Slot: 0
> >    Serial Number: 5001438006F44240
> >    RAID 6 (ADG) Status: Disabled
> >    Controller Status: OK
> >    Chassis Slot: 
> >    Hardware Revision: Rev C
> >    Firmware Version: 2.50
> >    Rebuild Priority: Medium
> >    Expand Priority: Medium
> >    Surface Scan Delay: 15 secs
> >    Surface Scan Mode: Idle
> >    Wait for Cache Room: Disabled
> >    Surface Analysis Inconsistency Notification: Disabled
> >    Post Prompt Timeout: 0 secs
> >    Cache Board Present: False
> >    Drive Write Cache: Disabled
> >    SATA NCQ Supported: True
> > 
> >    Array: A
> >       Interface Type: SATA
> >       Unused Space: 0 MB
> >       Status: OK
> > 
> > 
> > 
> >       Logical Drive: 1
> >          Size: 232.9 GB
> >          Fault Tolerance: RAID 1
> >          Heads: 255
> >          Sectors Per Track: 32
> >          Cylinders: 59844
> >          Strip Size: 128 KB
> >          Status: OK
> >          Unique Identifier: 600508B1001030364634343234300F00
> >          Disk Name: /dev/cciss/c0d0
> >          Mount Points: / 9.3 GB, /home 216.0 GB
> >          OS Status: LOCKED
> >          Logical Drive Label: A0124E845001438006F442403033
> >          Mirror Group 0:
> >             physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SATA, 250 GB, OK)
> >          Mirror Group 1:
> >             physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SATA, 250 GB, OK)
> > 
> >       physicaldrive 1I:1:1
> >          Port: 1I
> >          Box: 1
> >          Bay: 1
> >          Status: OK
> >          Drive Type: Data Drive
> >          Interface Type: SATA
> >          Size: 250 GB
> >          Firmware Revision: HPG2    
> >          Serial Number: K648T9C27M8E        
> >          Model: ATA     GJ0250EAGSQ     
> >          SATA NCQ Capable: True
> >          SATA NCQ Enabled: True
> >          PHY Count: 1
> >          PHY Transfer Rate: 3.0GBPS
> > 
> >       physicaldrive 1I:1:2
> >          Port: 1I
> >          Box: 1
> >          Bay: 2
> >          Status: OK
> >          Drive Type: Data Drive
> >          Interface Type: SATA
> >          Size: 250 GB
> >          Firmware Revision: HPG2    
> >          Serial Number: K648T9C27M49        
> >          Model: ATA     GJ0250EAGSQ     
> >          SATA NCQ Capable: True
> >          SATA NCQ Enabled: True
> >          PHY Count: 1
> >          PHY Transfer Rate: 3.0GBPS
> > 
> > 
> > 
> > 64 bit kernel, 4GB of memory
> > 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ