[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <cc247521-5507-44a2-8ff4-519c7a7b1c79@amd.com>
Date: Wed, 17 Jan 2024 11:34:37 -0600
From: Mario Limonciello <mario.limonciello@....com>
To: "Christian A. Ehrhardt" <lk@...e.de>, Dell.Client.Kernel@...l.com,
"Wang, Crag" <Crag.Wang@...l.com>
Cc: Heikki Krogerus <heikki.krogerus@...ux.intel.com>,
linux-usb@...r.kernel.org, Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Neil Armstrong <neil.armstrong@...aro.org>,
Hans de Goede <hdegoede@...hat.com>, Saranya Gopal
<saranya.gopal@...el.com>, linux-kernel@...r.kernel.org
Subject: Re: [RFC] Fix stuck UCSI controller on DELL
On 1/17/2024 00:35, Christian A. Ehrhardt wrote:
>
> Hi Mario,
>
> On Tue, Jan 16, 2024 at 09:00:03PM -0600, Mario Limonciello wrote:
>> On 1/15/2024 12:55, Christian A. Ehrhardt wrote:
>>>
>>> Hi Heikki,
>>>
>>> sorry to bother you again with this but I'm afraid there's
>>> a misunderstanding wrt. the nature of the quirk. See below:
>>>
>>> On Thu, Jan 04, 2024 at 01:59:02PM +0200, Heikki Krogerus wrote:
>>>> Hi Christian,
>>>>
>>>> On Wed, Jan 03, 2024 at 11:06:35AM +0100, Christian A. Ehrhardt wrote:
>>>>> I have a DELL Latitude 5431 where typec only works somewhat.
>>>>> After the first plug/unplug event the PPM seems to be stuck and
>>>>> commands end with a timeout (GET_CONNECTOR_STATUS failed (-110)).
>>>>>
>>>>> This patch fixes it for me but according to my reading it is in
>>>>> violation of the UCSI spec. On the other hand searching through
>>>>> the net it appears that many DELL models seem to have timeout problems
>>>>> with UCSI.
>>>>>
>>>>> Do we want some kind of quirk here? There does not seem to be a quirk
>>>>> framework for this part of the code, yet. Or is it ok to just send the
>>>>> additional ACK in all cases and hope that the PPM will do the right
>>>>> thing?
>>>>
>>>> We can use DMI quirks. Something like the attached diff (not tested).
>>>>
>>>> thanks,
>>>>
>>>> --
>>>> heikki
>>>
>>>> diff --git a/drivers/usb/typec/ucsi/ucsi_acpi.c b/drivers/usb/typec/ucsi/ucsi_acpi.c
>>>> index 6bbf490ac401..7e8b1fcfa024 100644
>>>> --- a/drivers/usb/typec/ucsi/ucsi_acpi.c
>>>> +++ b/drivers/usb/typec/ucsi/ucsi_acpi.c
>>>> @@ -113,18 +113,44 @@ ucsi_zenbook_read(struct ucsi *ucsi, unsigned int offset, void *val, size_t val_
>>>> return 0;
>>>> }
>>>> -static const struct ucsi_operations ucsi_zenbook_ops = {
>>>> - .read = ucsi_zenbook_read,
>>>> - .sync_write = ucsi_acpi_sync_write,
>>>> - .async_write = ucsi_acpi_async_write
>>>> -};
>>>> +static int ucsi_dell_sync_write(struct ucsi *ucsi, unsigned int offset,
>>>> + const void *val, size_t val_len)
>>>> +{
>>>> + u64 ctrl = *(u64 *)val;
>>>> + int ret;
>>>> +
>>>> + ret = ucsi_acpi_sync_write(ucsi, offset, val, val_len);
>>>> + if (ret && (ctrl & (UCSI_ACK_CC_CI | UCSI_ACK_CONNECTOR_CHANGE))) {
>>>> + ctrl= UCSI_ACK_CC_CI | UCSI_ACK_COMMAND_COMPLETE;
>>>> +
>>>> + dev_dbg(ucsi->dev->parent, "%s: ACK failed\n", __func__);
>>>> + ret = ucsi_acpi_sync_write(ucsi, UCSI_CONTROL, &ctrl, sizeof(ctrl));
>>>> + }
>>>
>>> Unfortunately, this has the logic reversed. The quirk (i.e. the
>>> additional UCSI_ACK_COMMAND_COMPLETE) is required after a _successful_
>>> UCSI_ACK_CONNECTOR_CHANGE. Otherwise, _subsequent_ commands will timeout
>>> (usually the next GET_CONNECTOR_CHANGE).
>>>
>>> This means the quirk must be applied _before_ we detect any failure.
>>> Consequently, the quirk has the potential to break working systems.
>>>
>>> Sorry, if that wasn't clear from my original mail. Please let me know
>>> if this changes how you want the quirks handled.
>>>
>>> Thanks Christian
>>>
>>
>> For the problematic scenario have you tried to play with it a bit to see if
>> it's too short of a timeout (raise timeout) or to output the response bits
>> to see if anything else surprising is sent?
>
> It is not a problem with the timeout. Waiting forever in this case
> doesn't help. IMHO this is actually a bug in the PPM, i.e. in Dell's
> bios.
"Usually" the PD controller F/W is distributed with the EC, but yes Dell
nominally puts everything in a monolithic BIOS package.
>
> Sending an ack after the timeout fixes things, though.
>
>> Does it always fail on the same command, or does it happen to a bunch of
>> them?
>
> It always fails on the first command after UCSI_ACK_CC_CI for a
> connector change. However, there might be no such command if the
> next event is a notification.
>
> I did play around with it a bit more and came up with a way to
> probe for the issue:
>
> https://lore.kernel.orgorg/all/20240116224041.220740-1-lk@c--e.de/
If some variation of your prob-able workaround is picked up I think it's
worth making noise when probed (dev_warn or dev_notice) about this
situation that it is being used to workaround a PPM bug.
>
> regards Christian
>
>
+ Dell Client Kernel mailbox
Dell team,
Can you look into this? It sounds like it should be investigated more
closely to see where the impedance mismatch against the spec and real
behavior actually lies.
Powered by blists - more mailing lists