lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <53897A95.6030904@ahsoftware.de>
Date:	Sat, 31 May 2014 08:45:41 +0200
From:	Alexander Holler <holler@...oftware.de>
To:	Marcel Holtmann <marcel@...tmann.org>
CC:	LKML <linux-kernel@...r.kernel.org>,
	linux-bluetooth <linux-bluetooth@...r.kernel.org>,
	"Gustavo F. Padovan" <gustavo@...ovan.org>,
	Johan Hedberg <johan.hedberg@...il.com>
Subject: Re: [PATCH 1/2] bluetooth: don't include local processing of HCI
 commands in the command timeout

Am 31.05.2014 07:28, schrieb Marcel Holtmann:
> Hi Alexander,
> 
>> I assume the timeout for processing HCI commands was originally intended to
>> detect hung bluetooth devices and should not include the time needed locally
>> to handle the response to an HCI command. That is important because the time
>> needed locally (by the kernel or even userland) to process responses to HCI
>> commands varies a lot between systems and HCI commands. That's even more true
>> since many actions to HCI command responses are handled inside works which
>> might be delayed quiet some time, depending on the actual system load.
>>
>> So stop the timeout as soon as a response to an HCI command was received.
>>
>> This fixes various problems which resulted in HCI command timeouts and an
>> afterwards non-working bluetooth stack, especially on slower systems like
>> some ARM devices.
>>
>> Drawback is that in-kernel problems like deadlocks aren't detected by HCI
>> command timeouts anymore, but such problems should be detected and handled
>> by other means and not by a timeout where it is hard to specify a value
>> reasonable for all possible systems (-configurations, -loads).
>>
>> Furthermore, if the timeout includes local processing of HCI command
>> responses, in-kernel errors like hung tasks might be masked by the
>> timeout, because the hung task would be killed by the timeout before
>> the hung task would be detected (by other means).
>>
>> Signed-off-by: Alexander Holler <holler@...oftware.de>
>> ---
>> net/bluetooth/hci_event.c | 12 ++++++------
>> 1 file changed, 6 insertions(+), 6 deletions(-)
>>
>> diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
>> index 15010a2..94c2dc0 100644
>> --- a/net/bluetooth/hci_event.c
>> +++ b/net/bluetooth/hci_event.c
>> @@ -2338,6 +2338,9 @@ static void hci_cmd_complete_evt(struct hci_dev *hdev, struct sk_buff *skb)
>>
>> 	opcode = __le16_to_cpu(ev->opcode);
>>
>> +	if (opcode != HCI_OP_NOP)
>> +		del_timer(&hdev->cmd_timer);
>> +
> 
> so I actually wonder if we should move away from timer and move to a delayed work item to handle the timeout and if that would actually fix this issue.

The problem is that I have absolutely no clue where these timeouts do
come from. They appear for different commands and almost always only at
boot. If the machine did come up without hci command timeouts, there
never was one afterwards. So I digged in the dark and the above patch
was one of the results. But I still had to increase the command timeout.
And I can't do much testing as I use this box. I just experience this
problem almost always when I boot it (to do kernel updates) and since a
very long time (more than a year I think).

Regards,

Alexander Holler
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ