linux-kernel - Re: [PATCH 15/24] thunderbolt: Rework control channel to be more reliable

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170525132546.GI16244@kroah.com>
Date:   Thu, 25 May 2017 15:25:46 +0200
From:   Greg Kroah-Hartman <gregkh@...uxfoundation.org>
To:     Mika Westerberg <mika.westerberg@...ux.intel.com>
Cc:     Andreas Noever <andreas.noever@...il.com>,
        Michael Jamet <michael.jamet@...el.com>,
        Yehezkel Bernat <yehezkel.bernat@...el.com>,
        Lukas Wunner <lukas@...ner.de>,
        Amir Levy <amir.jer.levy@...el.com>,
        Andy Lutomirski <luto@...nel.org>, Mario.Limonciello@...l.com,
        Jared.Dominguez@...l.com,
        Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH 15/24] thunderbolt: Rework control channel to be more
 reliable

On Thu, May 18, 2017 at 05:39:05PM +0300, Mika Westerberg wrote:
> If a request times out the response might arrive right after the request
> is failed. This response is pushed to the kfifo and next request will
> read it instead. Since it most likely will not pass our validation
> checks in parse_header() the next request will fail as well, and
> response to that request will be pushed to the kfifo, ad infinitum.
> 
> We end up in a situation where all requests fail and no devices can be
> added anymore until the driver is unloaded and reloaded again.
> 
> To overcome this, rework the control channel so that we will have a
> queue of outstanding requests. Each request will be handled in turn and
> the response is validated against what is expected. Unexpected packets
> (for example responses for requests that have been timed out) are
> dropped. This model is copied from Greybus implementation with small
> changes here and there to get it cope with Thunderbolt control packets.
> 
> In addition the configuration packets support sequence number which the
> switch is supposed to copy from the request to response. We use this to
> drop responses that are already timed out. Taking advantage of the
> sequence number, we automatically retry configuration read/write 4 times
> before giving up.
> 
> Also timeout is not a programming error so there is no need to trigger a
> scary backtrace (WARN), instead we just log a warning.  After all
> Thunderbolt devices are hot-pluggable by definition which means user can
> unplug a device any time and that is totally acceptable.
> 
> With this change there is no need to take the global domain lock when
> sending configuration packets anymore. This is useful when we add
> support for cross-domain (XDomain) communication later on.
> 
> Signed-off-by: Mika Westerberg <mika.westerberg@...ux.intel.com>
> Reviewed-by: Yehezkel Bernat <yehezkel.bernat@...el.com>
> Reviewed-by: Michael Jamet <michael.jamet@...el.com>
> ---
>  drivers/thunderbolt/ctl.c | 471 +++++++++++++++++++++++++++++++++++++++-------
>  drivers/thunderbolt/ctl.h |  65 +++++++
>  drivers/thunderbolt/tb.h  |   2 +-
>  3 files changed, 467 insertions(+), 71 deletions(-)
> 
> diff --git a/drivers/thunderbolt/ctl.c b/drivers/thunderbolt/ctl.c
> index b3ee755fdb37..2b1255cbf3c9 100644
> --- a/drivers/thunderbolt/ctl.c
> +++ b/drivers/thunderbolt/ctl.c
> @@ -5,22 +5,17 @@
>   */
>  
>  #include <linux/crc32.h>
> +#include <linux/delay.h>
>  #include <linux/slab.h>
>  #include <linux/pci.h>
>  #include <linux/dmapool.h>
>  #include <linux/workqueue.h>
> -#include <linux/kfifo.h>
>  
>  #include "ctl.h"
>  
>  
> -struct ctl_pkg {
> -	struct tb_ctl *ctl;
> -	void *buffer;
> -	struct ring_frame frame;
> -};
> -
> -#define TB_CTL_RX_PKG_COUNT 10
> +#define TB_CTL_RX_PKG_COUNT	10
> +#define TB_CTL_RETRIES		4
>  
>  /**
>   * struct tb_cfg - thunderbolt control channel
> @@ -32,8 +27,9 @@ struct tb_ctl {
>  
>  	struct dma_pool *frame_pool;
>  	struct ctl_pkg *rx_packets[TB_CTL_RX_PKG_COUNT];
> -	DECLARE_KFIFO(response_fifo, struct ctl_pkg*, 16);
> -	struct completion response_ready;
> +	struct mutex request_lock;
> +	struct list_head request_queue;
> +	bool running;
>  
>  	event_cb callback;
>  	void *callback_data;
> @@ -55,10 +51,115 @@ struct tb_ctl {
>  #define tb_ctl_dbg(ctl, format, arg...) \
>  	dev_dbg(&(ctl)->nhi->pdev->dev, format, ## arg)
>  
> +static DECLARE_WAIT_QUEUE_HEAD(tb_cfg_request_cancel_queue);
> +
> +/**
> + * tb_cfg_request_alloc() - Allocates a new config request
> + *
> + * This is refcounted object so when you are done with this, call
> + * tb_cfg_request_put() to it.
> + */
> +struct tb_cfg_request *tb_cfg_request_alloc(void)
> +{
> +	struct tb_cfg_request *req;
> +
> +	req = kzalloc(sizeof(*req), GFP_KERNEL);
> +	if (!req)
> +		return NULL;
> +
> +	kref_init(&req->kref);
> +
> +	return req;
> +}
> +
> +/**
> + * tb_cfg_request_get() - Increase refcount of a request
> + * @req: Request whose refcount is increased
> + */
> +void tb_cfg_request_get(struct tb_cfg_request *req)
> +{
> +	kref_get(&req->kref);
> +}
> +
> +static void tb_cfg_request_destroy(struct kref *kref)
> +{
> +	struct tb_cfg_request *req = container_of(kref, typeof(*req), kref);
> +
> +	kfree(req);
> +}
> +
> +/**
> + * tb_cfg_request_put() - Decrease refcount and possibly release the request
> + * @req: Request whose refcount is decreased
> + *
> + * Call this function when you are done with the request. When refcount
> + * goes to %0 the object is released.
> + */
> +void tb_cfg_request_put(struct tb_cfg_request *req)
> +{
> +	kref_put(&req->kref, tb_cfg_request_destroy);
> +}

What prevents this call from being called twice on the same object from
different threads at the same time?  You still need a lock somewhere to
protect yourself from that, am I just missing where that lock is?

thanks,

greg k-h