linux-kernel - Re: [PATCH v4 6/6] usb:cdns3 Fix for stuck packets in on-chip OUT buffer.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAL411-pC6VdaYyv4_rjanc3WgkJ+Ebcs0jUFims-7CUdv0pvpA@mail.gmail.com>
Date:   Mon, 11 Mar 2019 16:15:09 +0800
From:   Peter Chen <hzpeterchen@...il.com>
To:     Roger Quadros <rogerq@...com>
Cc:     Pawel Laszczak <pawell@...ence.com>,
        Felipe Balbi <felipe.balbi@...ux.intel.com>,
        "devicetree@...r.kernel.org" <devicetree@...r.kernel.org>,
        "gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
        "mark.rutland@....com" <mark.rutland@....com>,
        "linux-usb@...r.kernel.org" <linux-usb@...r.kernel.org>,
        "hdegoede@...hat.com" <hdegoede@...hat.com>,
        "heikki.krogerus@...ux.intel.com" <heikki.krogerus@...ux.intel.com>,
        "andy.shevchenko@...il.com" <andy.shevchenko@...il.com>,
        "robh+dt@...nel.org" <robh+dt@...nel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "jbergsagel@...com" <jbergsagel@...com>,
        "nsekhar@...com" <nsekhar@...com>, "nm@...com" <nm@...com>,
        Suresh Punnoose <sureshp@...ence.com>,
        "peter.chen@....com" <peter.chen@....com>,
        Rahul Kumar <kurahul@...ence.com>
Subject: Re: [PATCH v4 6/6] usb:cdns3 Fix for stuck packets in on-chip OUT buffer.

>
> On 07/03/2019 09:06, Pawel Laszczak wrote:
> > Hi,
> >
> >> Hi,
> >>
> >> On 21/02/2019 09:14, Felipe Balbi wrote:
> >>>
> >>> Hi,
> >>>
> >>> (please break your emails at 80-columns)
> >>>
> >>> Pawel Laszczak <pawell@...ence.com> writes:
> >>>>>> One more thing. Workaround has implemented algorithm that decide for which
> >>>>>> endpoint it should be enabled.  e.g for composite device MSC+NCM+ACM it
> >>>>>> should work only for ACM OUT endpoint.
> >>>>>>
> >>>>>
> >>>>> If ACM driver didn't queue the request for ACM OUT endpoint, why does the
> >>>>> controller accept the data at all?
> >>>>>
> >>>>> I didn't understand why we need a workaround for this. It should be standard
> >>>>> behaviour to NAK data if function driver didn't request for all endpoints.
> >>>>
> >>>> Yes, I agree with you. Controller shouldn’t accept such packet. As I know this
> >>>> behavior will be fixed in RTL.
> >>>>
> >>>> But I assume that some older version of this controller are one the market,
> >>>> and driver should work correct with them.
> >>>>
> >>>> In the feature this workaround can be limited only to selected controllers.
> >>>>
> >>>> Even now I assume that it can be enabled/disabled by module parameter.
> >>>
> >>> no module parameters, please. Use revision detection in runtime.
> >>>
> >>
> >> This is about whether to enable or disable the workaround.
> >> By default we don't want this workaround to be enabled.
> >>
> >> I'm debating whether we should have this workaround at all or not.
> >>
> >> It has the following problems.
> >>
> >> 1) It ACKs packets even when gadget end is not ready to accept the transfers.
> >> 2) It stores these packets in a temporary buffer and then pushes them to the
> >> gadget driver whenever the gadget driver is ready to process the data.
> >> 3) Since the gadget driver can become ready at an indefinite time in the
> >> future, it poses 2 problems:
> >> a) It is sending stale data to the sink. (problematic at next protocol level?)
> >> b) If this temporary buffer runs out we still hit the lock up issue.
> >>
> >> I think the right solution is to make sure that the gadget driver is always
> >> reading all the enabled OUT endpoints *or* (keep the OUT endpoints disabled
> >> if gadget driver is not ready to process OUT transfers).
> >
> > If driver disable endpoint then it not answer for packets from host.
> > It will result that host reset the device, so I can't disable such endpoints.
> >
> > Other good solution is to change ACM driver in a way that it sends requests
> > to controller driver after enabling endpoint. The class driver could decide
>
> The ACM driver is doing exactly that. However, there is a large delay (depending
> on when user opens the ttyACM) between endpoint enable and request queue and
> that's the issue for this controller.
>
> The endpoint is enabled whenever the host sends a SET_INTERFACE
> [acm_set_alt()->gserial_connect()]
> but the first request is queued only when user opens the ttyACM
> [gs_open()->gs_start_io()->gs_start_rx()].
>
> I'm don't think this sequence can be changed.
> What happens if you defer gserial_connect() to be done at gs_open()?
>

The host controller receives error due to there is no NAK either ACK
for endpoint
which should have already enabled. The host may send bus reset due to
the response
timeout for specific endpoint.

This issue only affects multiple OUT use case, we run this issue after
configuring gadget
as ACM + NCM, NCM can't work (always responds NAKs for OUT) due to ACM endpoint
receives host's data but the user doesn't receive it, since all OUT
endpoints share the same FIFOs.

So, if we do not enable this workaround, the multiple OUT endpoints
use case may not
work well since one OUT endpoint may affect other OUT endpoints due to
it occupies the
common OUT FIFO.

If we enable this workaround, for your below question:

> >> It has the following problems.
> >>
> >> 1) It ACKs packets even when gadget end is not ready to accept the transfers.

This can't be controlled by software, HW does it automatically once we
enable endpoint.

> >> 2) It stores these packets in a temporary buffer and then pushes them to the
> >> gadget driver whenever the gadget driver is ready to process the data.
> >> 3) Since the gadget driver can become ready at an indefinite time in the
> >> future, it poses 2 problems:
> >> a) It is sending stale data to the sink. (problematic at next protocol level?)

Maybe. The protocol may be recovered after several talks, eg, the device finds
the stale data, and let the host re-send.

> >> b) If this temporary buffer runs out we still hit the lock up issue.
> >>

It can be improved, if the temporary buffer is full, we could discard
some oldest data.

So, if not enable this workaround, the USB function is affected due to
physical data can't
be received, else, we may receive the stale data for some specific use
cases, it only affect
its own OUT endpoint, it may be recovered by protocol layer.  I prefer
enabling it by default
if this workaround is well tested.

Peter