lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201006090226.4275c824@kemnade.info>
Date:   Tue, 6 Oct 2020 09:02:26 +0200
From:   Andreas Kemnade <andreas@...nade.info>
To:     Guenter Roeck <linux@...ck-us.net>
Cc:     Arnd Bergmann <arnd@...db.de>, rydberg@...math.org,
        Jean Delvare <jdelvare@...e.com>, linux-hwmon@...r.kernel.org,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        hns@...delico.com
Subject: Re: [REGRESSION] hwmon: (applesmc) avoid overlong udelay()

On Thu, 1 Oct 2020 21:07:51 -0700
Guenter Roeck <linux@...ck-us.net> wrote:

> On 10/1/20 3:22 PM, Andreas Kemnade wrote:
> > On Wed, 30 Sep 2020 22:00:09 +0200
> > Arnd Bergmann <arnd@...db.de> wrote:
> >   
> >> On Wed, Sep 30, 2020 at 6:44 PM Guenter Roeck <linux@...ck-us.net> wrote:  
> >>>
> >>> On Wed, Sep 30, 2020 at 10:54:42AM +0200, Andreas Kemnade wrote:    
> >>>> Hi,
> >>>>
> >>>> after the $subject patch I get lots of errors like this:    
> >>>
> >>> For reference, this refers to commit fff2d0f701e6 ("hwmon: (applesmc)
> >>> avoid overlong udelay()").
> >>>    
> >>>> [  120.378614] applesmc: send_byte(0x00, 0x0300) fail: 0x40
> >>>> [  120.378621] applesmc: LKSB: write data fail
> >>>> [  120.512782] applesmc: send_byte(0x00, 0x0300) fail: 0x40
> >>>> [  120.512787] applesmc: LKSB: write data fail
> >>>>
> >>>> CPU sticks at low speed and no fan is turning on.
> >>>> Reverting this patch on top of 5.9-rc6 solves this problem.
> >>>>
> >>>> Some information from dmidecode:
> >>>>
> >>>> Base Board Information
> >>>>         Manufacturer: Apple Inc.
> >>>>         Product Name: Mac-7DF21CB3ED6977E5
> >>>>         Version: MacBookAir6,2
> >>>>
> >>>> Handle 0x0020, DMI type 11, 5 bytes OEM Strings         String 1: Apple ROM Version.  Model:       …,
> >>>> Handle 0x0020, DMI type 11, 5 bytes
> >>>> OEM Strings
> >>>>         String 1: Apple ROM Version.  Model:        MBA61.  EFI Version:  122.0.0
> >>>>         String 2: .0.0.  Built by:     root@...mon.  Date:         Wed Jun 10 18:
> >>>>         String 3: 10:36 PDT 2020.  Revision:     122 (B&I).  ROM Version:  F000_B
> >>>>         String 4: 00.  Build Type:   Official Build, Release.  Compiler:     Appl
> >>>>         String 5: e clang version 3.0 (tags/Apple/clang-211.10.1) (based on LLVM
> >>>>         String 6: 3.0svn).
> >>>>
> >>>> Writing to things in /sys/devices/platform/applesmc.768 gives also the
> >>>> said errors.
> >>>> But writing 1 to fan1_maunal and 5000 to fan1_output turns the fan on
> >>>> despite error messages.
> >>>>    
> >>> Not really sure what to do here. I could revert the patch, but then we'd gain
> >>> clang compile failures. Arnd, any idea ?    
> >>
> >> It seems that either I made a mistake in the conversion and it sleeps for
> >> less time than before, or my assumption was wrong that converting a delay to
> >> a sleep is safe here.
> >>
> >> The error message indicates that the write fails, not the read, so that
> >> is what I'd look at first. Right away I can see that the maximum time to
> >> retry is only half of what it used to be, as we used to wait for
> >> 0x10, 0x20, 0x40, 0x80, ..., 0x20000 microseconds for a total of
> >> 0x3fff0 microseconds (262ms), while my patch went with the 131ms
> >> total delay based on the comment saying "/* wait up to 128 ms for a
> >> status change. */".
> >>  
> > Yes, that is also what I read from the code. I just thought there must
> > be something simple, which just needs a short look from another pair of
> > eyes.
> >   
> >> Since there is sleeping wait, I see no reason the timeout couldn't
> >> be extended a lot, e.g. to a second, as in
> >>
> >> #define APPLESMC_MAX_WAIT 0x100000
> >>
> >> If that doesn't work, I'd try using mdelay() in place of
> >> usleep_range(), such as
> >>
> >>            mdelay(DIV_ROUND_UP(us, USEC_PER_MSEC)));
> >>
> >> This adds back a really nasty latency, but it should avoid the
> >> compile-time problem.
> >>
> >> Andreas, can you try those two things? (one at a time,
> >> not both)  
> > 
> > Ok, I tried. None of them works. I rechecked my work and created real
> > git commits out of them and CONFIG_LOCALVERSION_AUTO is also set so
> > the usual stupid things are rules out.
> > In detail:
> > On top of 5.9-rc6 + *reverted* patch:
> > diff --git a/drivers/hwmon/applesmc.c b/drivers/hwmon/applesmc.c
> > index fd99c9df8a00..2a9bd7f2b71b 100644
> > --- a/drivers/hwmon/applesmc.c
> > +++ b/drivers/hwmon/applesmc.c
> > @@ -45,7 +45,7 @@
> >  /* wait up to 128 ms for a status change. */
> >  #define APPLESMC_MIN_WAIT	0x0010
> >  #define APPLESMC_RETRY_WAIT	0x0100
> > -#define APPLESMC_MAX_WAIT	0x20000
> > +#define APPLESMC_MAX_WAIT	0x8000
> >  
> >  #define APPLESMC_READ_CMD	0x10
> >  #define APPLESMC_WRITE_CMD	0x11
> >   
> 
> Oh man, that code is so badlys broken.
> 
> send_byte() repeats sending the data if it was not immediately successful.
> That is done for both data and commands. Effectively that happens if
> the command is not immediately accepted. However, send_argument()
> clearly assumes that each data byte is sent exactly once. Sending
> it more than once will mess up the key that is supposed to be sent.
> The Apple SMC emulation code in qemu confirms that data bytes can not
> be written more than once.
> 
> Of course, theoretically it may be that the first data byte was not
> accepted (after all, the ACK bit is not set), but the ACK bit is
> not checked again after udelay(APPLESMC_RETRY_WAIT), so it may
> well have been set in the 256 uS between its check and re-writing
> the data.
> 
> In other words, this entire code only works accidentally to start with.
> 
> If you like, you could play around with the code and find out if and
> when exactly bit 1 (busy) is set, if and when bit 2 (ack) is set, and
> if and when any other bit is set. We could also try to read port 0x31e
> (the error port). Maybe the we can figure out what the error actually
> is. But then I don't really know what we could do with that information.
> 
Smoe research results: the second data byte seems to cause problems, not the
command byte.

> Other than that, the only useful idea I have is something crazy like
> 	if (us < 10000)
> 		udelay(us);
> 	else
> 		mdelay(DIV_ROUND_CLOSEST(udelay, 1000));
> in the hope that clang doesn't convert that back into a
> compile-time constant and udelay().
> 
> Overall it seems like the apple protocol may expect to receive data
> bytes faster than 1ms apart, because that is the only real difference
> between the original code and the new code using mdelay().

Yes, that explanation makes sense. If I am trying something like that, only
the last byte requires more than APPLESMC_MIN_WAIT. I have seen max. 256us.
So we could probably even use msleep for us > 1000 and udelay for anything below.

Regards,
Andreas

Content of type "application/pgp-signature" skipped

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ