lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADaLNDmnnZOoB1fxO=rQQa-eqT=B9G24rUrN_GzwAkFoC3Acbw@mail.gmail.com>
Date:	Mon, 10 Aug 2015 12:07:13 -0700
From:	Duc Dang <dhdang@....com>
To:	Bjorn Helgaas <bhelgaas@...gle.com>
Cc:	Tanmay Inamdar <tinamdar@....com>,
	"linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
	linux-arm <linux-arm-kernel@...ts.infradead.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: X-Gene: Unhandled fault: synchronous external abort in pci_generic_config_read32

On Mon, Aug 10, 2015 at 10:42 AM, Bjorn Helgaas <bhelgaas@...gle.com> wrote:
> On Mon, Aug 10, 2015 at 12:16 PM, Duc Dang <dhdang@....com> wrote:
>> On Monday, August 10, 2015, Bjorn Helgaas <bhelgaas@...gle.com> wrote:
>>>
>>> On Fri, Jul 31, 2015 at 12:00 PM, Duc Dang <dhdang@....com> wrote:
>>> > On Wed, Jul 29, 2015 at 8:55 AM, Bjorn Helgaas <bhelgaas@...gle.com>
>>> > wrote:
>>> >> On Tue, Jul 28, 2015 at 08:22:55PM -0500, Bjorn Helgaas wrote:
>>> >>> On Tue, Jul 28, 2015 at 02:50:39PM -0700, Duc Dang wrote:
>>> >>
>>> >>> > Do you have another PCIe card to try on the same reboot test on this
>>> >>> > board?
>>> >>>
>>> >>> I've seen this on at least two Mellanox cards.  I'm running similar
>>> >>> tests
>>> >>> on a different type of card now.
>>> >>
>>> >> FWIW, reboot tests on two machines with Mellanox cards failed, while
>>> >> the
>>> >> same test on a machine with a different proprietary card succeeded.
>>> >
>>> > Thanks, Bjorn.
>>> >
>>> > I don't have the same Mellanox card as yours, but I will also run
>>> > similar reboot test to see if I hit the same issue with my card.
>>>
>>> Any more hints on this?  Nothing has changed on my end, so of course
>>> I'm still seeing this, always on machines with Mellanox, and never on
>>> other machines.  Could this be a hardware issue like a signal
>>> integrity or margin issue?  I don't know where to go from here because
>>> I'm not a hardware person, and I don't know anything to do in
>>> software.
>>
>>
>> Hi Bjorn,
>>
>> I tried to run similar reboot tests on 2 different Mellanox cards (Connect-X
>> family, one card has 2 10G interfaces, the other one has 1 port that
>> supports InfiniBand) with U-Boot 1.15.12 and linux 4.2-rc5 and I did not see
>> the crash that you encounterred.
>>
>> Did you check if your Mellanox cards have latest firmware? I did see some
>> link issues on my Mellanox cards with its old firmware before.
>
> Good idea; I'll check that, too.  Also, I just learned that these
> cards on installed with an extender card because of some space issues,
> so we're going to test again without the extender.

Hi Bjorn,

Are other cards that passed your test installed directly to the
on-board PCIe slot?
If yes, then this is a good data point and it will be useful to test
the case where
your Mellanox cards are directly installed into the on-board PCIe slot.

-- 
Regards,
Duc Dang.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ