lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 27 Jul 2022 14:51:24 -0700
From:   Florian Fainelli <f.fainelli@...il.com>
To:     Juerg Haefliger <juerg.haefliger@...onical.com>
Cc:     Nicolas Saenz Julienne <nsaenzjulienne@...e.de>,
        Robin Murphy <robin.murphy@....com>, stefan.wahren@...e.com,
        Catalin Marinas <catalin.marinas@....com>,
        Robin Murphy <robin.murphy@....con>,
        bcm-kernel-feedback-list@...adcom.com,
        linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
        linux-pm@...r.kernel.org
Subject: Re: bcm2711_thermal: Kernel panic - not syncing: Asynchronous SError
 Interrupt

On 7/27/22 01:05, Juerg Haefliger wrote:
> On Wed, 10 Feb 2021 14:59:45 -0800
> Florian Fainelli <f.fainelli@...il.com> wrote:
> 
>> On 2/10/2021 8:55 AM, Nicolas Saenz Julienne wrote:
>>> Hi Robin,
>>>
>>> On Wed, 2021-02-10 at 16:25 +0000, Robin Murphy wrote:  
>>>> On 2021-02-10 13:15, Nicolas Saenz Julienne wrote:  
>>>>> [ Add Robin, Catalin and Florian in case they want to chime in ]
>>>>>
>>>>> Hi Juerg, thanks for the report!
>>>>>
>>>>> On Wed, 2021-02-10 at 11:48 +0100, Juerg Haefliger wrote:  
>>>>>> Trying to dump the BCM2711 registers kills the kernel:
>>>>>>
>>>>>> # cat /sys/kernel/debug/regmap/dummy-avs-monitor\@fd5d2000/range
>>>>>> 0-efc
>>>>>> # cat /sys/kernel/debug/regmap/dummy-avs-monitor\@fd5d2000/registers
>>>>>>
>>>>>> [   62.857661] SError Interrupt on CPU1, code 0xbf000002 -- SError  
>>>>>
>>>>> So ESR's IDS (bit 24) is set, which means it's an 'Implementation Defined
>>>>> SError,' hence IIUC the rest of the error code is meaningless to anyone outside
>>>>> of Broadcom/RPi.  
>>>>
>>>> It's imp-def from the architecture's PoV, but the implementation in this 
>>>> case is Cortex-A72, where 0x000002 means an attributable, containable 
>>>> Slave Error:
>>>>
>>>> https://developer.arm.com/documentation/100095/0003/system-control/aarch64-register-descriptions/exception-syndrome-register--el1-and-el3?lang=en
>>>>
>>>> In other words, the thing at the other end of an interconnect 
>>>> transaction said "no" :)
>>>>
>>>> (The fact that Cortex-A72 gets too far ahead of itself to take it as a 
>>>> synchronous external abort is a mild annoyance, but hey...)  
>>>
>>> Thanks for both your clarifications! Reading arm documentation is a skill on
>>> its own.  
>>
>> Yes it is.
>>
>>>   
>>>>> The regmap is created through the following syscon device:
>>>>>
>>>>> 	avs_monitor: avs-monitor@...d2000 {
>>>>> 		compatible = "brcm,bcm2711-avs-monitor",
>>>>> 			     "syscon", "simple-mfd";
>>>>> 		reg = <0x7d5d2000 0xf00>;
>>>>>
>>>>> 		thermal: thermal {
>>>>> 			compatible = "brcm,bcm2711-thermal";
>>>>> 			#thermal-sensor-cells = <0>;
>>>>> 		};
>>>>> 	};
>>>>>
>>>>> I've done some tests with devmem, and the whole <0x7d5d2000 0xf00> range is
>>>>> full of addresses that trigger this same error. Also note that as per Florian's
>>>>> comments[1]: "AVS_RO_REGISTERS_0: 0x7d5d2200 - 0x7d5d22e3." But from what I can
>>>>> tell, at least 0x7d5d22b0 seems to be faulty too.
>>>>>
>>>>> Any ideas/comments? My guess is that those addresses are marked somehow as
>>>>> secure, and only for VC4 to access (VC4 is RPi4's co-processor). Ultimately,
>>>>> the solution is to narrow the register range exposed by avs-monitor to whatever
>>>>> bcm2711-thermal needs (which is ATM a single 32bit register).  
>>>>
>>>> When a peripheral decodes a region of address space, nobody says it has 
>>>> to accept accesses to *every* address in that space; registers may be 
>>>> sparsely populated, and although some devices might be "nice" and make 
>>>> unused areas behave as RAZ/WI, others may throw slave errors if you poke 
>>>> at the wrong places. As you note, in a TrustZone-aware device some 
>>>> registers may only exist in one or other of the Secure/Non-Secure 
>>>> address spaces.
>>>>
>>>> Even when there is a defined register at a given address, it still 
>>>> doesn't necessarily accept all possible types of access; it wouldn't be 
>>>> particularly friendly, but a device *could* have, say, some registers 
>>>> that support 32-bit accesses and others that only support 16-bit 
>>>> accesses, and thus throw slave errors if you do the wrong thing in the 
>>>> wrong place.
>>>>
>>>> It really all depends on the device itself.  
>>>
>>> All in all, assuming there is no special device quirk to apply, the feeling I'm
>>> getting is to just let the error be. As you hint, firmware has no blame here,
>>> and debugfs is a 'best effort, zero guarantees' interface after all.  
>>
>> We should probably fill a regmap_access_table to deny reading registers
>> for which there is no address decoding and possibly another one to deny
>> writing to the read-only registers.
> 
> 
> Below is a patch that adds a read access table but it seems wrong to include
> 'internal.h' and add the table in the thermal driver. Shouldn't this happen
> in a higher layer, somehow between syscon and the thermal node?

What is the purpose of doing doing this though that cannot already be done using devmem/devmem2 if the point is explore the address space?
-- 
Florian

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ