lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200708133654.fp7k4whl2qmn5ygy@gilmour.lan>
Date:   Wed, 8 Jul 2020 15:36:54 +0200
From:   Maxime Ripard <maxime@...no.tech>
To:     Ondřej Jirman <megous@...ous.com>,
        linux-sunxi@...glegroups.com,
        Vasily Khoruzhick <anarsoul@...il.com>,
        Yangtao Li <tiny.windzz@...il.com>,
        Zhang Rui <rui.zhang@...el.com>,
        Daniel Lezcano <daniel.lezcano@...aro.org>,
        Amit Kucheria <amit.kucheria@...durent.com>,
        Chen-Yu Tsai <wens@...e.org>,
        "open list:ALLWINNER THERMAL DRIVER" <linux-pm@...r.kernel.org>,
        "moderated list:ARM/Allwinner sunXi SoC support" 
        <linux-arm-kernel@...ts.infradead.org>,
        open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] thermal: sun8i: Be loud when probe fails

On Wed, Jul 08, 2020 at 03:29:24PM +0200, Ondřej Jirman wrote:
> Hello Maxime,
> 
> On Wed, Jul 08, 2020 at 02:25:42PM +0200, Maxime Ripard wrote:
> > Hi,
> > 
> > On Wed, Jul 08, 2020 at 12:55:27PM +0200, Ondrej Jirman wrote:
> > > I noticed several mobile Linux distributions failing to enable the
> > > thermal regulation correctly, because the kernel is silent
> > > when thermal driver fails to probe. Add enough error reporting
> > > to debug issues and warn users in case thermal sensor is failing
> > > to probe.
> > > 
> > > Failing to notify users means, that SoC can easily overheat under
> > > load.
> > > 
> > > Signed-off-by: Ondrej Jirman <megous@...ous.com>
> > > ---
> > >  drivers/thermal/sun8i_thermal.c | 55 ++++++++++++++++++++++++++-------
> > >  1 file changed, 43 insertions(+), 12 deletions(-)
> > > 
> > > diff --git a/drivers/thermal/sun8i_thermal.c b/drivers/thermal/sun8i_thermal.c
> > > index 74d73be16496..9065e79ae743 100644
> > > --- a/drivers/thermal/sun8i_thermal.c
> > > +++ b/drivers/thermal/sun8i_thermal.c
> > > @@ -287,8 +287,12 @@ static int sun8i_ths_calibrate(struct ths_device *tmdev)
> > >  
> > >  	calcell = devm_nvmem_cell_get(dev, "calibration");
> > >  	if (IS_ERR(calcell)) {
> > > +		dev_err(dev, "Failed to get calibration nvmem cell (%ld)\n",
> > > +			PTR_ERR(calcell));
> > > +
> > >  		if (PTR_ERR(calcell) == -EPROBE_DEFER)
> > >  			return -EPROBE_DEFER;
> > > +
> > 
> > The rest of the patch makes sense, but we should probably put the error
> > message after the EPROBE_DEFER return so that we don't print any extra
> > noise that isn't necessarily useful
> 
> I thought about that, but in this case this would have helped, see my other
> e-mail. Though lack of "probe success" message may be enough for me, to
> debug the issue, I'm not sure the user will notice that a message is missing, while
> he'll surely notice if there's a flood of repeated EPROBE_DEFER messages.

Yeah, but on the other hand, we regularly have people that come up and
ask if a "legitimate" EPROBE_DEFER error message (as in: the driver
wasn't there on the first attempt but was there on the second) is a
cause of concern or not.

> And people run several distros for 3-4 months without anyone noticing any
> issues and that thermal regulation doesn't work. So it seems that lack of a
> success message is not enough.

I understand what the issue is, but do you really expect phone users to
monitor the kernel logs every time they boot their phone to see if the
thermal throttling is enabled?

If anything, it looks like a distro problem, and the notification /
policy to deal with that should be implemented in userspace.

> Other solution may be to select CONFIG_NVMEM_SUNXI_SID if this driver
> is enabled. That may get rid of this error scenario of waiting infinitely
> for calibration data with EPROBE_DEFER. And other potential EPROBE_DEFER sources
> will probably be quite visible even without this driver telling the user.
> So this message may not be necessary in that case.

That would only partially solve your issue. If the nvmem driver doesn't
load for some reason, you would end up in a similar situation.

Maxime

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ