lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ZtmIP1iTz+XnUD4o@kuha.fi.intel.com>
Date: Thu, 5 Sep 2024 13:30:23 +0300
From: Heikki Krogerus <heikki.krogerus@...ux.intel.com>
To: "Christian A. Ehrhardt" <lk@...e.de>
Cc: linux-usb@...r.kernel.org, linux-kernel@...r.kernel.org,
	Anurag Bijea <icaliberdev@...il.com>,
	Christian Heusel <christian@...sel.eu>,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	Dmitry Baryshkov <dmitry.baryshkov@...aro.org>,
	Jameson Thies <jthies@...gle.com>,
	Abhishek Pandit-Subedi <abhishekpandit@...omium.org>
Subject: Re: [PATCH v3] usb: typec: ucsi: Fix busy loop on ASUS VivoBooks

Hi,

On Wed, Sep 04, 2024 at 09:15:26PM +0200, Christian A. Ehrhardt wrote:
> 
> Hi Heikki,
> 
> On Wed, Sep 04, 2024 at 05:54:29PM +0300, Heikki Krogerus wrote:
> > On Wed, Sep 04, 2024 at 03:58:05PM +0200, Christian A. Ehrhardt wrote:
> > > 
> > > Hi Heikki,
> > > 
> > > On Wed, Sep 04, 2024 at 03:07:45PM +0300, Heikki Krogerus wrote:
> > > > On Tue, Sep 03, 2024 at 08:19:17PM +0200, Christian A. Ehrhardt wrote:
> > > > > If the busy indicator is set, all other fields in CCI should be
> > > > > clear according to the spec. However, some UCSI implementations do
> > > > > not follow this rule and report bogus data in CCI along with the
> > > > > busy indicator. Ignore the contents of CCI if the busy indicator is
> > > > > set.
> > > > > 
> > > > > If a command timeout is hit it is possible that the EVENT_PENDING
> > > > > bit is cleared while connector work is still scheduled which can
> > > > > cause the EVENT_PENDING bit to go out of sync with scheduled connector
> > > > > work. Check and set the EVENT_PENDING bit on entry to
> > > > > ucsi_handle_connector_change() to fix this.
> > > > > 
> > > > > Reported-by: Anurag Bijea <icaliberdev@...il.com>
> > > > > Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219108
> > > > > Bisected-by: Christian Heusel <christian@...sel.eu>
> > > > > Tested-by: Anurag Bijea <icaliberdev@...il.com>
> > > > > Fixes: de52aca4d9d5 ("usb: typec: ucsi: Never send a lone connector change ack")
> > > > > Cc: stable@...r.kernel.org
> > > > > Signed-off-by: Christian A. Ehrhardt <lk@...e.de>
> > > > > ---
> > > > >  drivers/usb/typec/ucsi/ucsi.c | 8 ++++++++
> > > > >  1 file changed, 8 insertions(+)
> > > > > 
> > > > > diff --git a/drivers/usb/typec/ucsi/ucsi.c b/drivers/usb/typec/ucsi/ucsi.c
> > > > > index 4039851551c1..540cb1d2822c 100644
> > > > > --- a/drivers/usb/typec/ucsi/ucsi.c
> > > > > +++ b/drivers/usb/typec/ucsi/ucsi.c
> > > > > @@ -38,6 +38,10 @@
> > > > >  
> > > > >  void ucsi_notify_common(struct ucsi *ucsi, u32 cci)
> > > > >  {
> > > > > +	/* Ignore bogus data in CCI if busy indicator is set. */
> > > > > +	if (cci & UCSI_CCI_BUSY)
> > > > > +		return;
> > > > 
> > > > I started testing this and it looks like the commands never get
> > > > cancelled when the BUSY bit is set. I don't think this patch is the
> > > > problem, though. I think the BUSY handling broke earlier, probable in
> > > > 5e9c1662a89b ("usb: typec: ucsi: rework command execution functions").
> > > > 
> > > > I need to look at this a bit more carefully, but in the meantime, can
> > > > you try this:
> > > > 
> > > > 	if (cci & UCSI_CCI_BUSY) {
> > > > 		complete(&ucsi->complete);
> > > >		return;
> > > >         }
> > > 
> > > I really don't think this is the correct thing to do and it will
> > > likely make things worse.
> > 
> > That was the behaviour before all that command execution refactoring
> > this summer. I'm not saying that it's right, but that's how it was.
> 
> The code to do that is still there but does not get called because
> the ETIMEDOUT error is checked for CCI in ucsi_run_command.
> I guess something like this (only compile tested) would fix it:
> 
> diff --git a/drivers/usb/typec/ucsi/ucsi.c b/drivers/usb/typec/ucsi/ucsi.c
> index 540cb1d2822c..d6d61606bbcf 100644
> --- a/drivers/usb/typec/ucsi/ucsi.c
> +++ b/drivers/usb/typec/ucsi/ucsi.c
> @@ -111,15 +111,13 @@ static int ucsi_run_command(struct ucsi *ucsi, u64 command, u32 *cci,
>  		size = clamp(size, 0, 16);
>  
>  	ret = ucsi->ops->sync_control(ucsi, command);
> -	if (ret)
> -		return ret;
> -
> -	ret = ucsi->ops->read_cci(ucsi, cci);
> -	if (ret)
> -		return ret;
> +	if (ucsi->ops->read_cci(ucsi, cci))
> +		return -EIO;
>  
>  	if (*cci & UCSI_CCI_BUSY)
>  		return -EBUSY;
> +	if (ret)
> +		return ret;
>  
>  	if (!(*cci & UCSI_CCI_COMMAND_COMPLETE))
>  		return -EIO;
> 

Yes, that looks good.

> > > A notification with the UCSI_CCI_BUSY bit does _not_ mean that
> > > the controller is busy doing other things and cannot complete the
> > > command.
> > > 
> > > Instead it is an indication that the controller _is_ working to
> > > complete our command but will take somewhat longer:
> > > 
> > > Citing:
> > > | Note: If a command takes longer than MIN_TIME_TO_RESPOND_WITH_BUSY ms
> > > |       for the PPM (excluding PPM to OPM communication latency) to complete,
> > > |       then the PPM shall respond to the command by setting the CCI Busy
> > > |       Indicator and notify the OPM.
> > > |       Subsequently, when the PPM actually completes the command, the
> > > |       PPM shall notify the OPM of the outcome of the command via an
> > > |       asynchronous notification associated with that command.
> > > 
> > > Unless I misunderstand what you are trying to do your change would
> > > cause us to needlessly abort/cancel every command that takes more than
> > > MIN_TIME_TO_RESPOND_WITH_BUSY to complete.
> > > 
> > > What am I missing?
> > 
> > The decision to Cancel was made to work around buggy EC firmwares that
> > reported BUSY, and then never completed the command. So without that
> > Cancel hack, the PPM was stuck on those systems.
> 
> Yes fine. But the cancel should be done _after_ the command times
> out normally, I guess. Otherwise conforming systems will get there
> commands terminated/aborted for no good reason. And this is what
> the current code tries to do.
> 
> > I don't know what we should do about that hack. We probable could just
> > ignore those old systems, and then add quirks for them as needed. But
> > I also don't really like what you are proposing in this patch, that we
> > basically ignore the BUSY bit completely.
> 
> See above. I think that solves both cases nicely.

Agreed. Can you incorporate that into this patch?

> > Right now I was hoping that we return the behaviour of the driver to
> > a point where everything worked as before, and after that start
> > improving the driver. That's why I was hoping to hear does the problem
> > that you are seeing go away with that approach.
> > 
> > With which command do you guys get the busy notification?
> 
> It happens for all types of commands. I will append debug output where
> all commands sent and all CCI values read are printed.
> 
> Unfortunately, I don't have direct access to the affected hardware.
> I'm just looking into this because one of my changes from earlier
> this year caused a regression on that machine. Is this sufficient to
> show what's going on?

Yes it's fine. I was mostly interested.

> > In any case, I don't think all those ucsi_*_common() functions give us
> > enough room to move here. I feel that the command execution needs to
> > be refactored somehow again.
> 
> That's your call to make but personally, I like the recent changes
> to the interface between ucsi.c and the backend drivers.

Just to clarify here, I did no have anything that drastic in mind.

Thanks Christian,

-- 
heikki

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ