linux-kernel - Re: [PATCH v7 4/4] Bluetooth: btqca: inject command complete event during fw download

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190102221510.GQ261387@google.com>
Date:   Wed, 2 Jan 2019 14:15:10 -0800
From:   Matthias Kaehlcke <mka@...omium.org>
To:     Balakrishna Godavarthi <bgodavar@...eaurora.org>
Cc:     Marcel Holtmann <marcel@...tmann.org>,
        Johan Hedberg <johan.hedberg@...il.com>,
        linux-kernel@...r.kernel.org, linux-bluetooth@...r.kernel.org,
        hemantg@...eaurora.org, linux-arm-msm@...r.kernel.org
Subject: Re: [PATCH v7 4/4] Bluetooth: btqca: inject command complete event
 during fw download

On Mon, Dec 31, 2018 at 11:34:46AM +0530, Balakrishna Godavarthi wrote:
> Hi Marcel,
> 
> On 2018-12-30 13:40, Marcel Holtmann wrote:
> > Hi Balakrishna,
> > 
> > > > > Latest qualcomm chips are not sending an command complete event for
> > > > > every firmware packet sent to chip. They only respond with a vendor
> > > > > specific event for the last firmware packet. This optimization will
> > > > > decrease the BT ON time. Due to this we are seeing a timeout error
> > > > > message logs on the console during firmware download. Now we are
> > > > > injecting a command complete event once we receive an vendor
> > > > > specific
> > > > > event for the last RAM firmware packet.
> > > > > Signed-off-by: Balakrishna Godavarthi <bgodavar@...eaurora.org>
> > > > > ---
> > > > > drivers/bluetooth/btqca.c | 39
> > > > > ++++++++++++++++++++++++++++++++++++++-
> > > > > drivers/bluetooth/btqca.h |  3 +++
> > > > > 2 files changed, 41 insertions(+), 1 deletion(-)
> > > > > diff --git a/drivers/bluetooth/btqca.c b/drivers/bluetooth/btqca.c
> > > > > index ec9e03a6b778..0b533f65f652 100644
> > > > > --- a/drivers/bluetooth/btqca.c
> > > > > +++ b/drivers/bluetooth/btqca.c
> > > > > @@ -144,6 +144,7 @@ static void qca_tlv_check_data(struct
> > > > > rome_config *config,
> > > > > 		 * In case VSE is skipped, only the last segment is acked.
> > > > > 		 */
> > > > > 		config->dnld_mode = tlv_patch->download_mode;
> > > > > +		config->dnld_type = config->dnld_mode;
> > > > > 		BT_DBG("Total Length           : %d bytes",
> > > > > 		       le32_to_cpu(tlv_patch->total_size));
> > > > > @@ -264,6 +265,31 @@ static int qca_tlv_send_segment(struct
> > > > > hci_dev *hdev, int seg_size,
> > > > > 	return err;
> > > > > }
> > > > > +static int qca_inject_cmd_complete_event(struct hci_dev *hdev)
> > > > > +{
> > > > > +	struct hci_event_hdr *hdr;
> > > > > +	struct hci_ev_cmd_complete *evt;
> > > > > +	struct sk_buff *skb;
> > > > > +
> > > > > +	skb = bt_skb_alloc(sizeof(*hdr) + sizeof(*evt) + 1, GFP_KERNEL);
> > > > > +	if (!skb)
> > > > > +		return -ENOMEM;
> > > > > +
> > > > > +	hdr = skb_put(skb, sizeof(*hdr));
> > > > > +	hdr->evt = HCI_EV_CMD_COMPLETE;
> > > > > +	hdr->plen = sizeof(*evt) + 1;
> > > > > +
> > > > > +	evt = skb_put(skb, sizeof(*evt));
> > > > > +	evt->ncmd = 1;
> > > > > +	evt->opcode = HCI_OP_NOP;

After looking a bit more at it I realize HCI_OP_NOP is not a good
value in this case:

static void hci_cmd_complete_evt(...)
{
  ...

  if (*opcode != HCI_OP_NOP)
    cancel_delayed_work(&hdev->cmd_timer);

  ...
}

https://elixir.bootlin.com/linux/v4.19/source/net/bluetooth/hci_event.c#L3351

Cancelling the command timeout is precisely what we want. Not sure why
the patch with HCI_OP_NOP makes the timeouts go away in most cases
(but not e.g. when inserting an msleep(1000) after downloading the
NVM.

I suggest to pass the opcode of the command to be completed.

> > > > > +
> > > > > +	skb_put_u8(skb, QCA_HCI_CC_SUCCESS);
> > > > > +
> > > > > +	hci_skb_pkt_type(skb) = HCI_EVENT_PKT;
> > > > > +
> > > > > +	return hci_recv_frame(hdev, skb);
> > > > > +}
> > > > > +
> > > > > static int qca_download_firmware(struct hci_dev *hdev,
> > > > > 				  struct rome_config *config)
> > > > > {
> > > > > @@ -297,11 +323,22 @@ static int
> > > > > qca_download_firmware(struct hci_dev *hdev,
> > > > > 		ret = qca_tlv_send_segment(hdev, segsize, segment,
> > > > > 					    config->dnld_mode);
> > > > > 		if (ret)
> > > > > -			break;
> > > > > +			goto out;
> > > > > 		segment += segsize;
> > > > > 	}
> > > > > +	/* Latest qualcomm chipsets are not sending a command
> > > > > complete event
> > > > > +	 * for every fw packet sent. They only respond with a
> > > > > vendor specific
> > > > > +	 * event for the last packet. This optimization in the chip will
> > > > > +	 * decrease the BT in initialization time. Here we will
> > > > > inject a command
> > > > > +	 * complete event to avoid a command timeout error message.
> > > > > +	 */
> > > > > +	if ((config->dnld_type == ROME_SKIP_EVT_VSE_CC ||
> > > > > +	    config->dnld_type == ROME_SKIP_EVT_VSE))
> > > > > +		return qca_inject_cmd_complete_event(hdev);
> > > > > +
> > > > have you actually considered using __hci_cmd_send in that case. It is
> > > > allowed for vendor OGF to use that command. I see you actually do use
> > > > it and now I am failing to understand what this is for.
> > > [Bala]: thanks for reviewing the change.
> > > 
> > > __hci_cmd_send() can be used only to send the command to the chip.
> > > it will not wait for the response for the command sent.
> > > 
> > > as you know that every vendor command sent to chip will respond with
> > > vendor specific event and command complete event.
> > > but in our case chip will only respond with vendor specific event
> > > only. so we are injecting command complete event.
> > 
> > and __hci_cmd_sync_ev is also not working for you? However since you
> > are not waiting for the vendor event anyway and just injecting
> > cmd_complete, I wonder what’s the difference in just using
> > __hci_cmd_send and not bothering to wait or inject at all. I am
> > failing to see where this injection makes a difference.
> > 
> > For me it is a big difference if we are injecting one event like in
> > the case of Intel compared to injecting one for every command. It will
> > show a wrong picture in btmon and that is a bad idea.
> > 
> > Regards
> > 
> > Marcel
> 
> [Bala]: here is the use case, when ever we download the fw packets i.e. RAM
> image, for every command sent(i.e. fw packet) from
> the host chip will respond with an vendor specific event and command
> complete event.
> 
> the above is taking more time to setup the BT device. then we came up with
> solution where we enable flags in fw file (i.e. RAM image header)
> whether to wait for event to be received or sent the total packets and wait
> for the events for the last packet.
> 
> So currently we are handling both the cases in the code. i.e wait for event
> for all packet or wait for an event for the last packet.
> 
> but in the second case i.e. wait for event for the last packet sent, we are
> only receiving an vendor specific event from chip which holds the status of
> fw download.
> 
> so as __hci_cmd_sync_ev() requires an command complete event. so we are
> injecting it after the vendor specific event received for the last packet.
> 
> This helps to overcome 0xfc00 timeout error logging on console.

Some more details:

The timeout error is actually from reading the 'SoC version', which
uses the same command code as the firmware download
(EDL_PATCH_CMD_OPCODE). Without reading the 'SoC version' it would be
from the command to write the first firmware segment.

If the download of a firmware binary takes >= 2s (HCI_CMD_TIMEOUT) the
timeout would still occur. If necessary this could be mitigated by
injecting some command complete events during the firmware download,
though I expect Marcel wouldn't be overly happy with that, since it
would affect btmon even more.

Regards

Matthias