lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Y0VILyEaNOQiTpO5@e120937-lin>
Date:   Tue, 11 Oct 2022 11:40:47 +0100
From:   Cristian Marussi <cristian.marussi@....com>
To:     Shivnandan Kumar <quic_kshivnan@...cinc.com>
Cc:     sudeep.holla@....com, linux-arm-kernel@...ts.infradead.org,
        linux-kernel@...r.kernel.org, quic_rgottimu@...cinc.com
Subject: Re: Query regarding "firmware: arm_scmi: Free mailbox channels if
 probe fails"

On Tue, Oct 11, 2022 at 03:34:45PM +0530, Shivnandan Kumar wrote:
> 
> Hi Cristian,
> 

Hi Shivnandan,

> >>Ok, just out of curiosity, once done, can you point me at your downstream
> public sources so I can see the issue and the fix that you are applying to
> your trees ?
> 
> https://source.codeaurora.org/quic/la/kernel/msm-5.10/tree/drivers/soc/qcom/qcom_rimps.c?h=KERNEL.PLATFORM.1.0.r1-07800-kernel.0
> 
> I have added lock while accessing con_priv inside irq handler and shutdown
> function.
> 

Thanks !

> 
> I have one input regarding timeout from firmware, can we enable BUG on
> response  time out in function do_xfer based on some debug config flag,this
> will help to debug firmware timeout issue faster.
> 
> We will only enable that config flag during internal testing.
> 

I understand a sort of 'Panic-on-timeout' would be useful to just freeze
the system as it is and debug, but it seems to me pretty much invasive
(and generally frowned upon) to BUG_ON timeouts, given on some SCMI
platforms/transports a few timeouts can happen really not so infrequently
due to transient conditions during moments of peak SCMI traffic.

Even though you mention to make it conditional to Kconfig, I'm not sure
this could fly, especially if you want to enable only for internal
testing...I'll ping Sudeep about this to see what he thinks.

As an alternative, what if I try to improve SCMI tracing/debug, let's say
dumping more info in dmesg about the offending (timed-out) message instead
of hanging the system as a whole ?

I'd have also some still-brewing-and-not-published patches to add some
SCMI stats somewhere in sysfs to be able to read current SCMI errors/timeouts
and transport anomalies, would that be of interest ?

...maybe, we could combine some of these stats and some sort of
BUG_ON/WARN_ON (if it will fly eventually..) into some kind SCMI_DEBUG mode
...any input on your needs about which kind of SCMI info you'll like to see
exposed by the stack would be welcome.

Last but not least, since we are talking about SCMI Server/FW testing,
have you (or your team) seen this work-in-progress of mine:

https://lore.kernel.org/linux-arm-kernel/20220903183042.3913053-1-cristian.marussi@arm.com/

about a new unified userspace interface to inject/snoop SCMI messages to
test/fuzz/stress the SCMI server wherever it is placed ?

Any feedback on the API proprosed in the cover-letter would be highly welcome;
I'll post a new V4 next week possibly, and the changes to the existing ARM SCMI
Compliance suite (mentioned in the cover) to support this new SCMI Raw
mode are in their final stage too.

Thanks,
Cristian

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ