[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <db042e9b-be41-11b1-7059-3881b1da5c8b@fnarfbargle.com>
Date: Tue, 3 Nov 2020 10:56:11 +1100
From: Brad Campbell <brad@...rfbargle.com>
To: Andreas Kemnade <andreas@...nade.info>,
Guenter Roeck <linux@...ck-us.net>
Cc: Arnd Bergmann <arnd@...db.de>, rydberg@...math.org,
Jean Delvare <jdelvare@...e.com>, linux-hwmon@...r.kernel.org,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
hns@...delico.com
Subject: Re: [REGRESSION] hwmon: (applesmc) avoid overlong udelay()
On 6/10/20 6:02 pm, Andreas Kemnade wrote:
> On Thu, 1 Oct 2020 21:07:51 -0700
> Guenter Roeck <linux@...ck-us.net> wrote:
>
>> On 10/1/20 3:22 PM, Andreas Kemnade wrote:
>>> On Wed, 30 Sep 2020 22:00:09 +0200
>>> Arnd Bergmann <arnd@...db.de> wrote:
>>>
>>>> On Wed, Sep 30, 2020 at 6:44 PM Guenter Roeck <linux@...ck-us.net> wrote:
>>>>>
>>>>> On Wed, Sep 30, 2020 at 10:54:42AM +0200, Andreas Kemnade wrote:
>>>>>> Hi,
>>>>>>
>>>>>> after the $subject patch I get lots of errors like this:
>>>>>
>>>>> For reference, this refers to commit fff2d0f701e6 ("hwmon: (applesmc)
>>>>> avoid overlong udelay()").
>>>>>
>>>>>> [ 120.378614] applesmc: send_byte(0x00, 0x0300) fail: 0x40
>>>>>> [ 120.378621] applesmc: LKSB: write data fail
>>>>>> [ 120.512782] applesmc: send_byte(0x00, 0x0300) fail: 0x40
>>>>>> [ 120.512787] applesmc: LKSB: write data fail
>>>>>>
>>>>>> CPU sticks at low speed and no fan is turning on.
>>>>>> Reverting this patch on top of 5.9-rc6 solves this problem.
>>>>>>
>>>>>> Some information from dmidecode:
>>>>>>
>>>>>> Base Board Information
>>>>>> Manufacturer: Apple Inc.
>>>>>> Product Name: Mac-7DF21CB3ED6977E5
>>>>>> Version: MacBookAir6,2
>>>>>>
>>>>>> Handle 0x0020, DMI type 11, 5 bytes OEM Strings String 1: Apple ROM Version. Model: …,
>>>>>> Handle 0x0020, DMI type 11, 5 bytes
>>>>>> OEM Strings
>>>>>> String 1: Apple ROM Version. Model: MBA61. EFI Version: 122.0.0
>>>>>> String 2: .0.0. Built by: root@...mon. Date: Wed Jun 10 18:
>>>>>> String 3: 10:36 PDT 2020. Revision: 122 (B&I). ROM Version: F000_B
>>>>>> String 4: 00. Build Type: Official Build, Release. Compiler: Appl
>>>>>> String 5: e clang version 3.0 (tags/Apple/clang-211.10.1) (based on LLVM
>>>>>> String 6: 3.0svn).
>>>>>>
>>>>>> Writing to things in /sys/devices/platform/applesmc.768 gives also the
>>>>>> said errors.
>>>>>> But writing 1 to fan1_maunal and 5000 to fan1_output turns the fan on
>>>>>> despite error messages.
>>>>>>
>>>>> Not really sure what to do here. I could revert the patch, but then we'd gain
>>>>> clang compile failures. Arnd, any idea ?
>>>>
>>>> It seems that either I made a mistake in the conversion and it sleeps for
>>>> less time than before, or my assumption was wrong that converting a delay to
>>>> a sleep is safe here.
>>>>
>>>> The error message indicates that the write fails, not the read, so that
>>>> is what I'd look at first. Right away I can see that the maximum time to
>>>> retry is only half of what it used to be, as we used to wait for
>>>> 0x10, 0x20, 0x40, 0x80, ..., 0x20000 microseconds for a total of
>>>> 0x3fff0 microseconds (262ms), while my patch went with the 131ms
>>>> total delay based on the comment saying "/* wait up to 128 ms for a
>>>> status change. */".
>>>>
>>> Yes, that is also what I read from the code. I just thought there must
>>> be something simple, which just needs a short look from another pair of
>>> eyes.
>>>
>>>> Since there is sleeping wait, I see no reason the timeout couldn't
>>>> be extended a lot, e.g. to a second, as in
>>>>
>>>> #define APPLESMC_MAX_WAIT 0x100000
>>>>
>>>> If that doesn't work, I'd try using mdelay() in place of
>>>> usleep_range(), such as
>>>>
>>>> mdelay(DIV_ROUND_UP(us, USEC_PER_MSEC)));
>>>>
>>>> This adds back a really nasty latency, but it should avoid the
>>>> compile-time problem.
>>>>
>>>> Andreas, can you try those two things? (one at a time,
>>>> not both)
>>>
>>> Ok, I tried. None of them works. I rechecked my work and created real
>>> git commits out of them and CONFIG_LOCALVERSION_AUTO is also set so
>>> the usual stupid things are rules out.
>>> In detail:
>>> On top of 5.9-rc6 + *reverted* patch:
>>> diff --git a/drivers/hwmon/applesmc.c b/drivers/hwmon/applesmc.c
>>> index fd99c9df8a00..2a9bd7f2b71b 100644
>>> --- a/drivers/hwmon/applesmc.c
>>> +++ b/drivers/hwmon/applesmc.c
>>> @@ -45,7 +45,7 @@
>>> /* wait up to 128 ms for a status change. */
>>> #define APPLESMC_MIN_WAIT 0x0010
>>> #define APPLESMC_RETRY_WAIT 0x0100
>>> -#define APPLESMC_MAX_WAIT 0x20000
>>> +#define APPLESMC_MAX_WAIT 0x8000
>>>
>>> #define APPLESMC_READ_CMD 0x10
>>> #define APPLESMC_WRITE_CMD 0x11
>>>
>>
>> Oh man, that code is so badlys broken.
>>
>> send_byte() repeats sending the data if it was not immediately successful.
>> That is done for both data and commands. Effectively that happens if
>> the command is not immediately accepted. However, send_argument()
>> clearly assumes that each data byte is sent exactly once. Sending
>> it more than once will mess up the key that is supposed to be sent.
>> The Apple SMC emulation code in qemu confirms that data bytes can not
>> be written more than once.
>>
>> Of course, theoretically it may be that the first data byte was not
>> accepted (after all, the ACK bit is not set), but the ACK bit is
>> not checked again after udelay(APPLESMC_RETRY_WAIT), so it may
>> well have been set in the 256 uS between its check and re-writing
>> the data.
>>
>> In other words, this entire code only works accidentally to start with.
>>
>> If you like, you could play around with the code and find out if and
>> when exactly bit 1 (busy) is set, if and when bit 2 (ack) is set, and
>> if and when any other bit is set. We could also try to read port 0x31e
>> (the error port). Maybe the we can figure out what the error actually
>> is. But then I don't really know what we could do with that information.
>>
> Smoe research results: the second data byte seems to cause problems, not the
> command byte.
>
>> Other than that, the only useful idea I have is something crazy like
>> if (us < 10000)
>> udelay(us);
>> else
>> mdelay(DIV_ROUND_CLOSEST(udelay, 1000));
>> in the hope that clang doesn't convert that back into a
>> compile-time constant and udelay().
>>
>> Overall it seems like the apple protocol may expect to receive data
>> bytes faster than 1ms apart, because that is the only real difference
>> between the original code and the new code using mdelay().
>
> Yes, that explanation makes sense. If I am trying something like that, only
> the last byte requires more than APPLESMC_MIN_WAIT. I have seen max. 256us.
> So we could probably even use msleep for us > 1000 and udelay for anything below.
>
> Regards,
> Andreas
>
G'day Andreas,
I've examined the code in VirtualSMC and I'm not convinced we were not waiting on the wrong bits.
#define SMC_STATUS_AWAITING_DATA BIT0 ///< Ready to read data.
#define SMC_STATUS_IB_CLOSED BIT1 /// A write is pending.
#define SMC_STATUS_BUSY BIT2 ///< Busy processing a command.
#define SMC_STATUS_GOT_COMMAND BIT3 ///< The last input was a command.
#define SMC_STATUS_UKN_0x16 BIT4
#define SMC_STATUS_KEY_DONE BIT5
#define SMC_STATUS_READY BIT6 // Ready to work
#define SMC_STATUS_UKN_0x80 BIT7 // error
Any chance you could try this patch? It's ugly, hacked together and currently fairly undocumented, but if it works I'll figure out how to clean it up (suggestions welcome).
It works on my MacbookPro 11,1.
I've also attached a script I adapted from https://github.com/floe/smc_util.git that I use to hammer the SMC with reads.
The behavior pre 5.9 and post this patch is the same.
Regards,
Brad
View attachment "smc.patch.2" of type "text/plain" (3897 bytes)
Download attachment "smc_dump_linux.sh" of type "application/x-shellscript" (797 bytes)
Powered by blists - more mailing lists