lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150810173809.GE15394@e104818-lin.cambridge.arm.com>
Date:	Mon, 10 Aug 2015 18:38:09 +0100
From:	Catalin Marinas <catalin.marinas@....com>
To:	Bjorn Helgaas <bhelgaas@...gle.com>
Cc:	Duc Dang <dhdang@....com>,
	"linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
	Tanmay Inamdar <tinamdar@....com>,
	linux-arm <linux-arm-kernel@...ts.infradead.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: X-Gene: Unhandled fault: synchronous external abort in
 pci_generic_config_read32

On Mon, Aug 10, 2015 at 11:18:23AM -0500, Bjorn Helgaas wrote:
> On Fri, Jul 31, 2015 at 12:00 PM, Duc Dang <dhdang@....com> wrote:
> > On Wed, Jul 29, 2015 at 8:55 AM, Bjorn Helgaas <bhelgaas@...gle.com> wrote:
> >> On Tue, Jul 28, 2015 at 08:22:55PM -0500, Bjorn Helgaas wrote:
> >>> On Tue, Jul 28, 2015 at 02:50:39PM -0700, Duc Dang wrote:
> >>
> >>> > Do you have another PCIe card to try on the same reboot test on this board?
> >>>
> >>> I've seen this on at least two Mellanox cards.  I'm running similar tests
> >>> on a different type of card now.
> >>
> >> FWIW, reboot tests on two machines with Mellanox cards failed, while the
> >> same test on a machine with a different proprietary card succeeded.
> >
> > Thanks, Bjorn.
> >
> > I don't have the same Mellanox card as yours, but I will also run
> > similar reboot test to see if I hit the same issue with my card.
> 
> Any more hints on this?  Nothing has changed on my end, so of course
> I'm still seeing this, always on machines with Mellanox, and never on
> other machines.  Could this be a hardware issue like a signal
> integrity or margin issue?  I don't know where to go from here because
> I'm not a hardware person, and I don't know anything to do in
> software.

Silly hack below, not actually a solution (and it may not even work):

diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 94d98cd1aad8..e895e96b3d13 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -369,6 +369,14 @@ static int do_bad(unsigned long addr, unsigned int esr, struct pt_regs *regs)
 	return 1;
 }
 
+/*
+ * Retry the faulty access.
+ */
+static int do_good(unsigned long addr, unsigned int esr, struct pt_regs *regs)
+{
+	return 0;
+}
+
 static struct fault_info {
 	int	(*fn)(unsigned long addr, unsigned int esr, struct pt_regs *regs);
 	int	sig;
@@ -391,7 +399,7 @@ static struct fault_info {
 	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 1 permission fault"	},
 	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 2 permission fault"	},
 	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 3 permission fault"	},
-	{ do_bad,		SIGBUS,  0,		"synchronous external abort"	},
+	{ do_good,		SIGBUS,  0,		"synchronous external abort"	},
 	{ do_bad,		SIGBUS,  0,		"asynchronous external abort"	},
 	{ do_bad,		SIGBUS,  0,		"unknown 18"			},
 	{ do_bad,		SIGBUS,  0,		"unknown 19"			},

-- 
Catalin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ