linux-kernel - Re: [PATCHv4 1/2] usb: USB Type-C connector class

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5779838C.9050008@roeck-us.net>
Date:	Sun, 3 Jul 2016 14:28:44 -0700
From:	Guenter Roeck <linux@...ck-us.net>
To:	Heikki Krogerus <heikki.krogerus@...ux.intel.com>
Cc:	Greg KH <gregkh@...uxfoundation.org>,
	Oliver Neukum <oneukum@...e.com>,
	Felipe Balbi <felipe.balbi@...ux.intel.com>,
	Roger Quadros <rogerq@...com>,
	Rajaram R <rajaram.officemail@...il.com>,
	linux-kernel@...r.kernel.org, linux-usb@...r.kernel.org
Subject: Re: [PATCHv4 1/2] usb: USB Type-C connector class

On 07/03/2016 12:38 PM, Heikki Krogerus wrote:
> On Fri, Jul 01, 2016 at 07:33:12AM -0700, Guenter Roeck wrote:
>> On Fri, Jul 01, 2016 at 03:05:35PM +0300, Heikki Krogerus wrote:
>>> I've updated my github branch with a commit where both of these issues
>>> should be fixed. Can you give it a try?
>>>
>>
>> This is going to be very difficult. So far, calls such as
>> typec_set_data_role() were independent (asynchronous) of role change
>> requests through sysfs, meaning they happened whenever lower level
>> confirmed that a role change was complete, for whatever reason. Now
>> I have to check if a role change request through the class code
>> is pending and not call typec_set_data_role() and friends in that case.
>
> I'm sorry about the misunderstanding.
>
> What you want is basically that we only support non-blocking mode with
> this interface, and we can't do that.
>

No, that is not what I said or asked for. The problems I am seeing are due
to locking in the class code. "Asynchronous" above referred to the state
machine vs. role change requests by the class code, which operates independent
of each other in my code. Set requests from the class code still wait for
the state machine to complete.

The problem is that the state machine now needs to know if the class code
has a set role request pending, because in that case it can no longer report
role changes directly to the class code. This includes role changes unrelated
to the actual set role request. That code is now much more complicated, especially
since each callback into the typec code is handled differently. For example,
typec_disconnect() does not require a lock (unless I missed it), but many
other functions do.

> Would you even be able to make it work? How would you report errors
> for example?
>

It worked just great when you had no locks. All I had to do is handle
role changes in the driver, and to implement the necessary locks there.
Actually, before you introduced the locks, you had suggested that this
was the way to go, which is why I implemented it that way.

>> This becomes even more complicated in situations with parallel role
>> change requests both from the class code and from the partner.
>> In such cases the class code may have acquired the lock, and before it
>> calls the driver, a role change complete is reported. The call to
>> typec_set_XXX_role() will then get stuck. So I'll have to add another
>> flag before calling typec_set_data_role() and friends, and then reject
>> the call to dr_set and all other set requests with -EBUSY.
>> race conditions.
>
> Or we'll just export the port locks somehow, or provide separate API
> for handling them.
>
> ..typec_get_port_lock(..
> {
>          return &port->lock;
> }
>
> or
>
> typec_lock(..
> typec_test_and_lock(..
> typec_unlock(..
> ...
>
>> Thinking more ... ultimately, this means that I'll have to reject role
>> change requests from the class code whenever the state machine is busy,
>> because I never know if the state machine activity would result in a
>> call into the typec class code.
>
> Yes, why is that a problem? The other option would be that you queue
> the requests from users, and I'm pretty sure we want to avoid that. It
> would mean you have to consider a lot of conditions to avoid any
> races, and you would also have to make a lot of extra decisions. As an
> example, what do you do if two role change requests are in the queue
> for the same role? If the first requests is successful, you probable
> can just report success to the second request also, _maybe_! But if
> the first one fails, do you try again for the second request, and in
> that case do you wait for the result of the second request before
> reporting to the first, or do you just fail both of them? etc.
>

No, that is not correct. Again, all I had to do was to implement a lock
in the driver which waited until the state machine was idle, and _then_
check if the set command needed to do anything. No queuing necessary,
no race conditions.

Keep in mind there are no role _change_ requests, the requests are to set
a specific role - and I can do that once I have the state machine lock.
The second request would again be a request to set a specific role. It waits
for the state machine lock, and after it is acquired decides if a role
change is necessary or not.

Sure, I can switch to your method, and I guess I'll have to.
I just think that the resulting -EBUSY responses are unnecessary.

I will reiterate, though, that this all worked just fine with the previous
version of your code, where sysfs role change requests did not require
or implement a lock in the class code.

>> At the same time, the protocol driver
>> will have to reject incoming role change requests while a locally
>> requested role change request is pending, even if that internal
>> role change request has not yet resulted in a PD message being sent.
>
> Why?
>
Example:

Class code requests data role change, remote partner requests power role
change. Both are handled in parallel by the state machine. State machine
tries to report power role change to class code and gets stuck because the
class code holds a lock due to the pending data role change.

Therefore, a role change request from the partner now has to be rejected
while a (different) role change request from the class code is pending.

>> Is this really necessary ? It creates a lot of interdependencies.
>>
>> In summary, this means a substantial amount of code and testing, which
>> is going to take a while.
>>
>> You might want to mention all this in the API documentation.
>
> OK.
>
>>> I've also removed the supports_usb_power_delivery and id_header_vdo
>>> attributes from partners and cables. We really have to put all the USB
>>> PD specific attributes to its own group, and that group can't be in
>>> control of this class. We will simply not always have access to the
>>> kind of USB PD specific details you want to show, for example details
>>> that you get from discover identity command, even when USB PD is fully
>>> supported. For example on systems that use UCSI.
>>>
>>
>> I think we should have a single unified ABI, independent of the underlying
>> driver, that informs the user about the partner device. I really don't
>> quite understand why the class code should not be able to report what device
>> it is connected to (while at the same time being able to report the alternate
>> modes it supports).
>
> OK, let's plan this more then. Maybe we can make for example a layer
> that creates the groups for the PD specific details to the class?
>
> The problem is still that we can only provide results from for example
> request identity command reliably when the PD protocol layer is
> completely inside kernel, and that is not always the case. So we
> really need to group those details in its own group, and that group
> will basically indicate is the PD stack inside the kernel or not.
>
> We should not forget also that the userspace can never rely on those
> details because of the fact that they simply will not always be
> available.
>
On the other side, not being able to rely on a well defined ABI makes the
ABI much less useful.

Guenter