lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <a399137345cebc850e5d38886a33f42af4a9c434.camel@fi.rohmeurope.com>
Date:   Thu, 28 Jan 2021 12:49:39 +0000
From:   "Vaittinen, Matti" <Matti.Vaittinen@...rohmeurope.com>
To:     "broonie@...nel.org" <broonie@...nel.org>
CC:     "lgirdwood@...il.com" <lgirdwood@...il.com>,
        "angelogioacchino.delregno@...ainline.org" 
        <angelogioacchino.delregno@...ainline.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: short-circuit and over-current IRQs


On Thu, 2021-01-28 at 12:10 +0000, Mark Brown wrote:
> On Thu, Jan 28, 2021 at 09:23:08AM +0000, Vaittinen, Matti wrote:
> > On Wed, 2021-01-27 at 16:32 +0000, Mark Brown wrote:
> > > Note that the events the API currently has are expected to be for
> > > the
> > > actual error conditions, not for the warning ones - indicating
> > > that
> > > the
> > > voltage is out of regulation for example.
> > I am unsure how to interpret this. What is the criteria of issue
> > being
> > an error/warning. When I was talking about warning I meant that the
> > issue which is detected is unexpected and abnormal (error?) - but
> > might
> > still be recoverable (warning?). I understand the regulator
> > framework
> > must not signal same events for different purposes - but I don't
> > really
> > know what the current events are used for - I am grateful for any
> > guidance!
> 
> What the majority of hardware interrupts on is situations where
> things
> have already gone out of spec and there are actual problems with the
> output - for example with current limiting there's often an actual
> limiter in there so the regulator simply won't supply any more
> current
> than is configured.  With a warning everything is still working fine
> but
> getting close to not doing so.

Sounds reasonable. Warning while things are still working - but are
getting to the boundary. Error when things are already pretty wrong.
Thanks.

> > > Well, if these things are kicking in the hardware is in serious
> > > trouble
> > > anyway so it's unclear what the system would be likely to do in
> > > software, and also unclear how safe it is to rely on software to
> > > be
> > > able
> > > to take that action given that it let things get into such a bad
> > > state
> > > in the first place.
> > Actually, bear with me but I am unsure why we have these
> > notifications
> > if we don't expect SW to be able to do anything? Wouldn't the panic
> > print be all that is needed then? I think that setups which have
> > dual
> 
> You'll notice that there aren't any actual users of this stuff in
> tree
> at the minute - people don't generally put much effort into software
> recovery as they're not expecting to be anywhere near limiting in
> normal
> operation.  What I'd expect people to do where they do implement
> handling is something like shutting down all other supplies on the
> device, possibly also trying to shut down the system as a
> whole.  Things
> more about preventing physical damage rather than being part of the
> normal operation of the system.

Again this makes sense. I will try to ask form HW colleagues what they
thought to be the action SW take (I hope they have some scenario on
mind - let's see). If they tell me that they expect SW to shut down
system gracefully - then I keep errors, if they tell me they think SW
will temporarily disable some HW blocks or do other "tricks" and later
resume normal operation - then I will see if I can add some new
'warning' indicators.

> For thermal issues systems generally try to apply software limits
> well
> before an individual component starts flagging things up with an
> interrupt, the limits that devices have are generally super high and
> often there'll be issues at a system level (eg, a case getting
> unusably
> hot) earlier and it can take a while for responses to have an impact.

I think this is also case with the BD9576 - 140 C sounds pretty hot to
me - and I expect this is really where things are already badly wrong.
So I guess I can keep the 'error' here.

> 
> > limits (one for initiating potential SW recovery - other for HW to
> > forcing protection) actually make sense. So does implementing
> > notifiers
> > / error statuses for events where SW recovery is potentially
> > helpful.
> > But whether the existing event notifications / error flags are
> > correct
> > for these is something I can't decide :) Here I ask guidance for
> > Mark &
> > others who know what is the idea behind existing error-
> > flags/events.
> 
> It's not that we shouldn't implement support for warnings, it's that
> they're not the common case for hardware and so won't line up with
> behaviour for other users.


Agreed. As I said, I understand we shouldn't send same events to
different situations. If current errors are used to indicate things are
really "wrong" to the point where safest thing is to shut down system -
then we'd better add these "warnings" to indicate that there would
potentially still be time to change something - before things are shut
off.

Thanks again!

Best regards
	Matti Vaittinen

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ