linux-kernel - RE: [PATCH] Time out for possible dead loops during queued invalidation wait

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <A6AD88C3F2289247BE726C37303E1EB8AF9331CE@orsmsx505.amr.corp.intel.com>
Date:	Wed, 27 May 2009 15:40:35 -0700
From:	"Yu, Fenghua" <fenghua.yu@...el.com>
To:	'Andrew Morton' <akpm@...ux-foundation.org>
CC:	'David Woodhouse' <dwmw2@...radead.org>,
	'Ingo Molnar' <mingo@...e.hu>,
	'LKML' <linux-kernel@...r.kernel.org>,
	'IOMMU' <iommu@...ts.linux-foundation.org>
Subject: RE: [PATCH] Time out for possible dead loops during queued
 invalidation wait

>>
>> Subject: [PATCH] Time out for possible dead loops during queued
>invalidation wait
>
>nits:
>
>Please ensure that each patch title identifies the subsystem which is
>being altered.  Because someone who reads this title has no clue what
>part of the kernel is affected unless they dive in and read the actual
>patch.  Suitable title prefixes for this one would be "dmar: " or
>"drivers/pci/dmar.c: ".
>
>The usual term for a timeout is "timeout", not "time out".
>
>The term "dead loop" is unclear.  The reader might think that it refers
>to a never-executed loop, as in "dead code".  A better term is
>"infinite loop".
>
>So I ended up with the title
>
>	"drivers/pci/dmar.c: timeout for possible infinite loops during
>queued invalidation wait"
>
>Welcome to my life :(
>
>> Two loops in qi_submit_sync() do not have time out. If hardware can not
>finish
>> the queued invalidation for some reason, the loops could end up in dead
>loops
>> without any hint for what is going on. I add time out for the loops and
>report
>> warning when time out happens.
>>
>> Signed-off-by: Fenghua Yu <fenghua.yu@...el.com>
>
>ok...
>
>>  dmar.c |   12 ++++++++++++
>>  1 files changed, 12 insertions(+)
>>
>> diff --git a/drivers/pci/dmar.c b/drivers/pci/dmar.c
>> index fa3a113..95baacd 100644
>> --- a/drivers/pci/dmar.c
>> +++ b/drivers/pci/dmar.c
>> @@ -637,6 +637,7 @@ int qi_submit_sync(struct qi_desc *desc, struct
>intel_iommu *iommu)
>>  	struct qi_desc *hw, wait_desc;
>>  	int wait_index, index;
>>  	unsigned long flags;
>> +	cycles_t start_time;

>It seems strange to me that the driver chose to implement a ten second
>timeout using such a high resolution thing as cycles_t.  Why not use
>plain old jiffies?  The advantages are:
>
>- jiffies can be read very efficiently
>
>- there's more kernel support for manipulating jiffies-based values.
>
>For example,

I'll use time_after() and jiffies in the updated patch. The get_cycles() approach is being used in other places in iommu code. I'll change them too.

>
>>  	if (!qi)
>>  		return 0;
>> @@ -644,8 +645,13 @@ int qi_submit_sync(struct qi_desc *desc, struct
>intel_iommu *iommu)
>>  	hw = qi->desc;
>>
>>  	spin_lock_irqsave(&qi->q_lock, flags);
>> +	start_time = get_cycles();
>>  	while (qi->free_cnt < 3) {
>>  		spin_unlock_irqrestore(&qi->q_lock, flags);
>> +		if (DMAR_OPERATION_TIMEOUT < (get_cycles() - start_time)) {
>
>if we were using jiffies, this nasty thing could just use time_after().
>
>But that's all outside the scope of your patch.
>
>
>I'd find it more readable if the above were coded as
>
>	if (get_cycles() - start_time >= DMAR_OPERATION_TIMEOUT)
>
>but maybe that's just me.
>
>> +			WARN(1, "No space in invalidation queue.\n");
>> +			return -ENOSPC;
>
>ENOSPC means "your disk filled up".  I think it makes no sense to use
>that error code in this context, even though it kinda sounds the same.
>

Which error code is better? Is EAGAIN ok?

>> +		}
>>  		cpu_relax();
>>  		spin_lock_irqsave(&qi->q_lock, flags);
>>  	}
>> @@ -675,6 +681,7 @@ int qi_submit_sync(struct qi_desc *desc, struct
>intel_iommu *iommu)
>>  	 */
>>  	writel(qi->free_head << 4, iommu->reg + DMAR_IQT_REG);
>>
>> +	start_time = get_cycles();
>>  	while (qi->desc_status[wait_index] != QI_DONE) {
>>  		/*
>>  		 * We will leave the interrupts disabled, to prevent interrupt
>> @@ -687,6 +694,11 @@ int qi_submit_sync(struct qi_desc *desc, struct
>intel_iommu *iommu)
>>  		if (rc)
>>  			goto out;
>>
>> +		if (DMAR_OPERATION_TIMEOUT < (get_cycles() - start_time)) {
>> +			WARN(1, "Queued invalidation can not complete.\n");
>> +			goto out;
>
>As `rc' now contains zero, this will cause the function to incorrectly
>return the "success" code, even though it is known that it did not
>succeed.
>
>> +		}
>> +
>>  		spin_unlock(&qi->q_lock);
>>  		cpu_relax();
>>  		spin_lock(&qi->q_lock);

I'll change this code.

Thanks.

-Fenghua
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/