lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20090526225146.2faeeb05.akpm@linux-foundation.org>
Date:	Tue, 26 May 2009 22:51:46 -0700
From:	Andrew Morton <akpm@...ux-foundation.org>
To:	Fenghua Yu <fenghua.yu@...el.com>
Cc:	"'David Woodhouse'" <dwmw2@...radead.org>,
	"'Ingo Molnar'" <mingo@...e.hu>,
	"'LKML'" <linux-kernel@...r.kernel.org>,
	"'IOMMU'" <iommu@...ts.linux-foundation.org>
Subject: Re: [PATCH] Time out for possible dead loops during queued
 invalidation wait

On Wed, 20 May 2009 10:42:59 -0700 Fenghua Yu <fenghua.yu@...el.com> wrote:
>
> Subject: [PATCH] Time out for possible dead loops during queued invalidation wait

nits:

Please ensure that each patch title identifies the subsystem which is
being altered.  Because someone who reads this title has no clue what
part of the kernel is affected unless they dive in and read the actual
patch.  Suitable title prefixes for this one would be "dmar: " or
"drivers/pci/dmar.c: ".

The usual term for a timeout is "timeout", not "time out".

The term "dead loop" is unclear.  The reader might think that it refers
to a never-executed loop, as in "dead code".  A better term is
"infinite loop".

So I ended up with the title

	"drivers/pci/dmar.c: timeout for possible infinite loops during queued invalidation wait"

Welcome to my life :(

> Two loops in qi_submit_sync() do not have time out. If hardware can not finish
> the queued invalidation for some reason, the loops could end up in dead loops
> without any hint for what is going on. I add time out for the loops and report
> warning when time out happens.
>  
> Signed-off-by: Fenghua Yu <fenghua.yu@...el.com>

ok...

>  dmar.c |   12 ++++++++++++
>  1 files changed, 12 insertions(+)
> 
> diff --git a/drivers/pci/dmar.c b/drivers/pci/dmar.c
> index fa3a113..95baacd 100644
> --- a/drivers/pci/dmar.c
> +++ b/drivers/pci/dmar.c
> @@ -637,6 +637,7 @@ int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu)
>  	struct qi_desc *hw, wait_desc;
>  	int wait_index, index;
>  	unsigned long flags;
> +	cycles_t start_time;

It seems strange to me that the driver chose to implement a ten second
timeout using such a high resolution thing as cycles_t.  Why not use
plain old jiffies?  The advantages are:

- jiffies can be read very efficiently

- there's more kernel support for manipulating jiffies-based values.

For example,

>  	if (!qi)
>  		return 0;
> @@ -644,8 +645,13 @@ int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu)
>  	hw = qi->desc;
>  
>  	spin_lock_irqsave(&qi->q_lock, flags);
> +	start_time = get_cycles();
>  	while (qi->free_cnt < 3) {
>  		spin_unlock_irqrestore(&qi->q_lock, flags);
> +		if (DMAR_OPERATION_TIMEOUT < (get_cycles() - start_time)) {

if we were using jiffies, this nasty thing could just use time_after().

But that's all outside the scope of your patch.


I'd find it more readable if the above were coded as

	if (get_cycles() - start_time >= DMAR_OPERATION_TIMEOUT)

but maybe that's just me.

> +			WARN(1, "No space in invalidation queue.\n");
> +			return -ENOSPC;

ENOSPC means "your disk filled up".  I think it makes no sense to use
that error code in this context, even though it kinda sounds the same.

> +		}
>  		cpu_relax();
>  		spin_lock_irqsave(&qi->q_lock, flags);
>  	}
> @@ -675,6 +681,7 @@ int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu)
>  	 */
>  	writel(qi->free_head << 4, iommu->reg + DMAR_IQT_REG);
>  
> +	start_time = get_cycles();
>  	while (qi->desc_status[wait_index] != QI_DONE) {
>  		/*
>  		 * We will leave the interrupts disabled, to prevent interrupt
> @@ -687,6 +694,11 @@ int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu)
>  		if (rc)
>  			goto out;
>  
> +		if (DMAR_OPERATION_TIMEOUT < (get_cycles() - start_time)) {
> +			WARN(1, "Queued invalidation can not complete.\n");
> +			goto out;

As `rc' now contains zero, this will cause the function to incorrectly
return the "success" code, even though it is known that it did not
succeed.

> +		}
> +
>  		spin_unlock(&qi->q_lock);
>  		cpu_relax();
>  		spin_lock(&qi->q_lock);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ