linux-kernel - Re: [patch 1/2] Intel IOMMU Suspend/Resume Support for DMAR

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090325201228.GA26204@elte.hu>
Date:	Wed, 25 Mar 2009 21:12:28 +0100
From:	Ingo Molnar <mingo@...e.hu>
To:	fenghua.yu@...el.com
Cc:	dwmw2@...radead.org, amluto@...il.com, kyle@...hat.com,
	mgross@...ux.intel.com, linux-kernel@...r.kernel.org,
	iommu@...ts.linux-foundation.org
Subject: Re: [patch 1/2] Intel IOMMU Suspend/Resume Support for DMAR


* fenghua.yu@...el.com <fenghua.yu@...el.com> wrote:

> The attached patch implements the suspend and resume feature for 
> DMAR. It hooks to kernel suspend and resume interface.  When 
> suspend happens, it saves necessary hardware registers. When 
> resume happens it restores the registers and restarts DMAR.

The structure looks pretty clean. I suspect David will go over the 
lowlevel details - i only have a few nitpicks:

>  #include <linux/intel-iommu.h>
> +#include <linux/sysdev.h>
> +#include <asm/i8259.h>

why is that needed?

>  #include <asm/cacheflush.h>
>  #include <asm/iommu.h>
>  #include "pci.h"
> @@ -2563,6 +2565,161 @@ static void __init init_no_remapping_devices(void)
>  	}
>  }
>  
> +#ifdef CONFIG_PM
> +static int init_iommu_hw(void)
> +{
> +	struct dmar_drhd_unit *drhd;
> +	struct intel_iommu *iommu;
> +
> +	for_each_drhd_unit(drhd) {
> +		if (drhd->ignored)
> +			continue;
> +
> +		iommu = drhd->iommu;
> +
> +		if (iommu->qi)
> +			dmar_reenable_qi(iommu);
> +	}
> +
> +	for_each_drhd_unit(drhd) {
> +		if (drhd->ignored)
> +			continue;

i suggested the following cleanliness detail on the previous 
submission a month or two ago, and it's still valid: why isnt the 
drhd->ignored skipping integrated into for_each_drhd_unit() ?

Perhaps named: for_each_active_drhd_unit().

Also, in practice, we iterate over active iommus, so this whole 
sequence:

> +	for_each_drhd_unit(drhd) {
> +		if (drhd->ignored)
> +			continue;
> +
> +		iommu = drhd->iommu;

Could perhaps be replaced with for_each_active_iommu(iommu).

> +
> +		iommu_flush_write_buffer(iommu);
> +
> +		iommu_set_root_entry(iommu);
> +
> +		iommu->flush.flush_context(iommu, 0, 0, 0, DMA_CCMD_GLOBAL_INVL,
> +					   0);
> +		iommu->flush.flush_iotlb(iommu, 0, 0, 0, DMA_TLB_GLOBAL_FLUSH,
> +					 0);

just put those '0);' lines to the end of the call and ignore 
checkpatch.

> +		iommu_disable_protect_mem_regions(iommu);
> +		iommu_enable_translation(iommu);
> +	}
> +
> +	return 0;
> +}
> +
> +static void iommu_flush_all(void)
> +{
> +	struct dmar_drhd_unit *drhd;
> +	struct intel_iommu *iommu;
> +
> +	for_each_drhd_unit(drhd) {
> +		if (drhd->ignored)
> +			continue;
> +
> +		iommu = drhd->iommu;
> +		iommu->flush.flush_context(iommu, 0, 0, 0, DMA_CCMD_GLOBAL_INVL,
> +					   0);
> +		iommu->flush.flush_iotlb(iommu, 0, 0, 0, DMA_TLB_GLOBAL_FLUSH,
> +					 0);

ditto.

> +	}
> +}
> +
> +static u32 iommu_state[MAX_IOMMUS][MAX_IOMMU_REGS];

hm, this goes up to MAX_IOMMUS which you define to 32.

But dmar_drhd_unit registration in dmar_register_drhd_unit() is 
open-ended and has no such limit.

While i'm sure there's a limit in the spec, there's the chance for 
there to be more than 32 units enumerated (in future hardware) - and 
this will break silently.

It would be far cleaner to embedd the IOMMU state in the IOMMU 
driver data structure directly. That gets rid of the limit and it 
also makes this array allocated dynamically.

> +
> +static int iommu_suspend(struct sys_device *dev, pm_message_t state)
> +{
> +	struct dmar_drhd_unit *drhd;
> +	struct intel_iommu *iommu;
> +	unsigned long flag;
> +	int    i = 0;

(the tab before the 'i' looks a bit weird.)

> +
> +	iommu_flush_all();
> +
> +	for_each_drhd_unit(drhd) {
> +		if (drhd->ignored)
> +			continue;
> +
> +		iommu = drhd->iommu;
> +
> +		iommu_disable_translation(iommu);
> +
> +		spin_lock_irqsave(&iommu->register_lock, flag);
> +
> +		iommu_state[i][DMAR_FECTL_REG] =
> +			(u32) readl(iommu->reg + DMAR_FECTL_REG);
> +		iommu_state[i][DMAR_FEDATA_REG] =
> +			(u32) readl(iommu->reg + DMAR_FEDATA_REG);
> +		iommu_state[i][DMAR_FEADDR_REG] =
> +			(u32) readl(iommu->reg + DMAR_FEADDR_REG);
> +		iommu_state[i][DMAR_FEUADDR_REG] =
> +			(u32) readl(iommu->reg + DMAR_FEUADDR_REG);

no need for the cast - readl returns u32.

> +
> +		spin_unlock_irqrestore(&iommu->register_lock, flag);
> +		i++;
> +	}
> +	return 0;
> +}
> +
> +static int iommu_resume(struct sys_device *dev)
> +{
> +	struct dmar_drhd_unit *drhd;
> +	struct intel_iommu *iommu;
> +	unsigned long flag;
> +	int i = 0;
> +
> +	if (init_iommu_hw())
> +		panic("IOMMU setup failed, DMAR can not start!\n");

Please dont panic boxes ... insert a WARN() and return.

> +
> +	for_each_drhd_unit(drhd) {
> +		if (drhd->ignored)
> +			continue;
> +
> +		iommu = drhd->iommu;
> +
> +		spin_lock_irqsave(&iommu->register_lock, flag);
> +
> +		writel((u32) iommu_state[i][DMAR_FECTL_REG],
> +			iommu->reg + DMAR_FECTL_REG);
> +		writel((u32) iommu_state[i][DMAR_FEDATA_REG],
> +			iommu->reg + DMAR_FEDATA_REG);
> +		writel((u32) iommu_state[i][DMAR_FEADDR_REG],
> +			iommu->reg + DMAR_FEADDR_REG);
> +		writel((u32) iommu_state[i][DMAR_FEUADDR_REG],
> +			iommu->reg + DMAR_FEUADDR_REG);

there's no need for the (u32) case. iommu_state[] has u32 types, and 
writel takes u32.

> +
> +		spin_unlock_irqrestore(&iommu->register_lock, flag);
> +		i++;
> +	}
> +	return 0;
> +}
> +
> +static struct sysdev_class iommu_sysclass = {
> +	.name		= "iommu",
> +	.resume		= iommu_resume,
> +	.suspend	= iommu_suspend,
> +};
> +
> +static struct sys_device device_iommu = {
> +	.cls	= &iommu_sysclass,
> +};
> +
> +static int __init init_iommu_sysfs(void)
> +{
> +	int error;
> +
> +	error = sysdev_class_register(&iommu_sysclass);
> +	if (error)
> +		return error;
> +
> +	error = sysdev_register(&device_iommu);
> +	if (error)
> +		sysdev_class_unregister(&iommu_sysclass);
> +
> +	return error;
> +}
> +
> +#else
> +static init __init init_iommu_sysfs(void)
> +{
> +	return 0;
> +}
> +#endif	/* CONFIG_PM */
> +
>  int __init intel_iommu_init(void)
>  {
>  	int ret = 0;
> @@ -2598,6 +2755,7 @@ int __init intel_iommu_init(void)
>  	init_timer(&unmap_timer);
>  	force_iommu = 1;
>  	dma_ops = &intel_dma_ops;
> +	init_iommu_sysfs();
>  
>  	register_iommu(&intel_iommu_ops);
>  
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index 1d6c71d..5ec836b 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -205,6 +205,9 @@ static inline void dmar_writeq(void __iomem *addr, u64 val)
>  /* low 64 bit */
>  #define dma_frcd_page_addr(d) (d & (((u64)-1) << PAGE_SHIFT))
>  
> +#define MAX_IOMMUS      32
> +#define MAX_IOMMU_REGS  0xc0

MAX_IOMMU_REGS ... would be nice to add some reference here where 
that limit comes from. Specification / documentation coordinates.

That will give those people who ever extend the hw side (and it will 
happen) a chance to notice this limit in the Linux kernel.

Also, is there a chance that MAX_IOMMU_REGS can be probed from the 
hardware directly? If that's possible and reliable then it would be 
nicer to make this limit dynamic. The registers are saved/restored 
straight and in a linear fashion - no reason to not make that 
extensible if we have a chance for it.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/