lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAALWOA_7qGg0WyHPE3biVLka7CdLj8VgsGzf1WMG62Rt-oONkQ@mail.gmail.com>
Date:   Sat, 10 Sep 2016 16:46:49 +0200
From:   Matthijs van Duin <matthijsvanduin@...il.com>
To:     Tony Lindgren <tony@...mide.com>
Cc:     "linux-omap@...r.kernel.org" <linux-omap@...r.kernel.org>,
        linux-arm <linux-arm-kernel@...ts.infradead.org>,
        lkml <linux-kernel@...r.kernel.org>
Subject: L3 error handling (was: Re: [4.8.0-rc1] am335x-evm boot failure:
 n_tty_receive_buf_common: "Unable to handle kernel paging request..")

On 10 September 2016 at 15:10, Tony Lindgren <tony@...mide.com> wrote:
> Yeah I don't think we have L3 interrupts working for am335x.

It probably doesn't help that the L3 interconnect registers on the
am335x aren't documented in the TRM. See below for its list of
components, target IDs, address mapping, and L3 error irq routing
(obtained by mostly-automated scanning/testing).

The problem you mention of getting a useless traceback is indeed
annoying, but on a cortex-a8 it wouldn't happen for device accesses:
external aborts on device reads (and strongly-ordered reads/writes)
are synchronous and taken before the irq. If you'd hook into that
handler and grab/clear the corresponding L3 error to make the abort
more informative then the irq will never be taken. Bus errors on
device writes outside the cortex-A8 subsystem never result in an abort
reported to the cpu and by the time the irq is taken the traceback may
be less informative (although there's still good chance it's not far
from the culprit).

On the cortex-A9 I don't know what the situation is.

On the cortex-A15 I don't think your advice actually helps since all
bus errors seem to result in async aborts reported really ridiculously
late: I've seen bus errors in a userspace process actually get
reported by the L3 noc driver (complete with useless traceback),
resulting in a task switch to systemd-journald to log all that spam,
and only *then* the async abort was taken resulting in a perfectly
innocent process getting killed with a SIGBUS.

Needless to say, this is just... wrong.

Matthijs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ