linux-kernel - RE: [PATCH 1/2] usb: dwc3: core: Introduce dwc3_device

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <D669B587C1E508469413CF713E34377E732C71C9@DBDE04.ent.ti.com>
Date:	Mon, 4 Apr 2016 20:35:38 +0000
From:	"Shankar, Abhishek" <a.shankar@...com>
To:	Felipe Balbi <balbi@...nel.org>, "Quadros, Roger" <rogerq@...com>,
	"John Youn" <John.Youn@...opsys.com>
CC:	"Nori, Sekhar" <nsekhar@...com>,
	"linux-usb@...r.kernel.org" <linux-usb@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"B, Ravi" <ravibabu@...com>
Subject: RE: [PATCH 1/2] usb: dwc3: core: Introduce dwc3_device_reinit()

Roger
I am sorry but this mail chain is unreadable!! I am not able to filter out the >>>...
Please send me a clean version of the mail if possible...i will be unable to respond without that..

---Abhishek

-----Original Message-----
From: Felipe Balbi [mailto:balbi@...nel.org] 
Sent: Monday, April 04, 2016 3:11 AM
To: Quadros, Roger; John Youn
Cc: Nori, Sekhar; linux-usb@...r.kernel.org; linux-kernel@...r.kernel.org; Shankar, Abhishek; B, Ravi
Subject: Re: [PATCH 1/2] usb: dwc3: core: Introduce dwc3_device_reinit()


Hi,

Roger Quadros <rogerq@...com> writes:
>>>>>>>>>>>> We will need this function for a workaround.
>>>>>>>>>>>> The function issues a softreset only to the device 
>>>>>>>>>>>> controller and performs minimal re-initialization so that 
>>>>>>>>>>>> the device controller can be usable.
>>>>>>>>>>>>
>>>>>>>>>>>> As some code is similar to dwc3_core_init() take out common 
>>>>>>>>>>>> code into dwc3_get_gctl_quirks().
>>>>>>>>>>>>
>>>>>>>>>>>> We add a new member (prtcap_mode) to struct dwc3 to keep 
>>>>>>>>>>>> track of the current mode in the PRTCAPDIR register.
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Roger Quadros <rogerq@...com>
>>>>>>>>>>>
>>>>>>>>>>> I must say, I don't like this at all :-p There's ONE known 
>>>>>>>>>>> silicon which needs this because of a poor silicon 
>>>>>>>>>>> integration which took an IP with a known erratum where it 
>>>>>>>>>>> can't be made to work on lower speeds and STILL was integrated without a superspeed PHY.
>>>>>>>>>>>
>>>>>>>>>>> There's a reason why I never tried to push this upstream 
>>>>>>>>>>> myself ;-)
>>>>>>>>>>>
>>>>>>>>>>> I'm really thinking we might be better off adding a quirk 
>>>>>>>>>>> flag to skip the metastability workaround and allow this ONE 
>>>>>>>>>>> silicon to set the controller to lower speed.
>>>>>>>>>>>
>>>>>>>>>>> John, can you check with your colleagues if we would ever 
>>>>>>>>>>> fall into
>>>>>>>>>>> STAR#9000525659 if we set maximum speed to high speed during 
>>>>>>>>>>> driver probe and never touch it again ? I would assume we 
>>>>>>>>>>> don't really fall into the metastability workaround, right ? 
>>>>>>>>>>> We're not doing any sort of PM for dwc3...
>>>>>>>>>>>
>>>>>>>>
>>>>>>>> Hi Felipe,
>>>>>>>>
>>>>>>>> Do you mean to keep DCFG.speed to SS and set dwc->maximum_speed to HS?
>>>>>>>> I don't see an issue with that as long as we always ignore
>>>>>>>> dwc->maximum_speed when programming DCFG.speed for all affected
>>>>>>>> versions of the core. As long as the DCFG.speed = SS, you 
>>>>>>>> should not hit the STAR.
>>>>>>>
>>>>>>> I actually mean changing DCFG.speed during driver probe and 
>>>>>>> never touching it again. Would that still cause problems ?
>>>>>>>
>>>>>>
>>>>>> In that case I'm not sure. The engineer who would know is off 
>>>>>> until next week so I'll get back to you as soon as I can talk to 
>>>>>> him about it.
>>>>>
>>>>
>>>> So the engineers said that this issue can occur while set to HS and 
>>>> the run/stop bit is changed so it seems that won't work.
>>>
>>> Thanks John.
>>>
>>> Felipe,
>>>
>>> any suggestion how we can fix this upstream?
>> 
>> no idea, I don't have a lot of memory about this problem. I really 
>> don't remember the details about this, is there an openly available 
>> errata document which I could read ? /me goes search for it.
>> 
>> I found [1] which tells me, the following:
>> 
>> 
>> | i819        | A Device Control Bit Meta-Stability for USB3.0 Controller in USB2.0 Mode   |
>> |-------------+----------------------------------------------------------------------------|
>> | Criticality | Medium                                                                     |
>> |             |                                                                            |
>> | Descritiion | When USB3.0 controller core is programmed to be a USB 2.0-only device      |
>> |             | hardware meta-stability on USB_DCTL[31]RUNSTOP bit causing the core to     |
>> |             | attempt high speed as well as SuperSpeed connection or completely miss     |
>> |             | the attach request.                                                        |
>> |             |                                                                            |
>> | Workaround  | If the requirement is to always function in USB 2.0 mode, there is no      |
>> |             | workaround.                                                                |
>> |             | Otherwise, you can always program the USB controller core to be SuperSpeed |
>> |             | 3.0 capable (USB_DCFG[2:0]DEVSPD = 0x4)                                    |
>> |             |                                                                            |
>> | Revisions   | SR 1.1, 2.0                                                                |
>> | Impacted    |                                                                            |
>> |-------------+----------------------------------------------------------------------------|
>> 
>> So, TI's own documentation says that there is _no_ workaround. My
>
> We are working on updating that document. Apparently Synopsis has suggested this workaround.
> pasting verbatim
>
> "
> -          As last step of device configuration we set the RUNSTOP bit in DCTL.
>
> -          Once we set the RUNSTOP bit, we need to monitor GDBGLTSSM for 100 ms until one of the two below happens:.
>
> 	o   We see the GDBGLTSSM.LTDB_LINK_STATE changing from 4
>
> 	o   We receive the USB 2.0 reset interrupt.
>
> If none of above happens then we can stop monitoring it.
>
> -          If state change from 4 occurs issue a SoftReset thru DCTL.CSftRst and reconfigure Device. This time it is guaranteed that no metastability will occur so no need to do the 100ms timeout.
> "

I don't have this text anywhere so I don't know. Is this something TI came up with or Synopsys ? Unless I can see a document (preferrably from
Synopsys) stating this, I can't really accept $subject.

Another question is: if all it takes is an extra SoftReset, why don't we just reset it during probe() if max_speed < SUPER and we're running on rev < 2.20a ? BTW, which revision of the IP is on AM57x/DRA7x ?

>> question is, then: How are you sure that resetting the device 
>> actually solves the issue ? Did you really hit the metastability 
>> problem and noted that it works after a soft-reset ? How did you 
>> verify that
>
> I don't know if it solves the issue or not. It was suggested by 
> Synopsis to TI's silicon team.

now that's a bummer ;-)

> I never hit the metastability problem detection condition in my 
> overnight tests (i.e. LTDB_LINK_STATE != 4).

overnight is not enough. You need to keep this running for weeks.

> I have verified that things work after a soft-reset by faking that the 
> error happens.

but if you never hit the actual failure, you have verified that it works _without_ the workaround. I mean, if you can't be sure RUN/STOP went metastable, you can't really be sure you're working around the Erratum.

>> Run/Stop was in a metastable state, considering that Run/Stop signal 
>> is not visible outside the die ?
>
> LTDB_LINK_STATE != 4 within 100ms or RUNSTOP set is the condition to 
> detect that the issue occurred.

this doesn't prove anything. This just means that your 100ms timer expired. Unless you can verify that Run/Stop is in metastability, you cannot be sure this workaround works.

Did anybody run silicon simulation to verify this case ? It's really the only way to be sure.

>> It seems to me that resetting the IP is just as "dangerous" as 
>> setting the IP to High-speed in the first place. No ?
>
> The soft-reset is just a recovery mechanism if that error is ever hit.

but you don't know if that's a *proper* recovery mechanism because you never even *hit* the error.

> Putting the controller in reset state means it is in a known state. 
> Why do you say it would be dangerous?

Because you can't predict the systems' behavior. If the flip-flop didn't have time to settle into 0 or 1 state, you don't know what the combinatorial part of the IP will do with that bogus value. It's truly unpredictable. You also cannot know, for sure, that a SoftReset will be enough to bring that flip-flop out of metastability.

> The original workaround i.e. setting the High-speed instance to 
> Super-speed to avoid this errata is causing another side effect. i.e. 
> erratic interrupts happen and more than 2 seconds delay

this should have been an expected side-effect when you design a SuperSpeed controller without a SuperSpeed PHY and don't properly terminate inputs. What you have is a floating PIPE3 interface not properly terminated and capturing random noise (basically acting as a very poor antenna inside your die). Of course the IP will go bonkers and give you "erratic error" interrupts. It has no idea what the hell this "PHY" on the PIPE3 interface is doing.

> to enumerations. This problem is more serious and so we have to keep 
> the controller in High-speed and tackle the meta-stability condition 
> if it happens.

you have to tackle the meta-stability, sure, but we need guarantee that $subject IS indeed tackling that problem. Unless there's proof that this really solves the meta-stability issue, I won't take $subject. Sorry dude, but I don't want regressions elsewhere because of a badly validated patch.

Bottomline, it's not enough to _state_ that it solves the problem. You need to first *trigger* the issue without the workaround, then apply workaround and trigger it again. Then, and only then, you can be certain you're fixing the problem.

After that, we will look into how to make sure this has no impact to other users.

--
balbi