linux-kernel - Re: Intel IOMMU (and IOMMU for Virtualization) performances

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <da824cf30806051134u4fb8d419p2ba2dafcb1ba33a7@mail.gmail.com>
Date:	Thu, 5 Jun 2008 11:34:56 -0700
From:	"Grant Grundler" <grundler@...gle.com>
To:	"FUJITA Tomonori" <fujita.tomonori@....ntt.co.jp>
Cc:	linux-kernel@...r.kernel.org, mgross@...ux.intel.com,
	linux-scsi@...r.kernel.org
Subject: Re: Intel IOMMU (and IOMMU for Virtualization) performances

On Thu, Jun 5, 2008 at 7:49 AM, FUJITA Tomonori
<fujita.tomonori@....ntt.co.jp> wrote:
...
>> You can easily emulate SSD drives by doing sequential 4K reads
>> from a normal SATA HD. That should result in ~7-8K IOPS since the disk
>> will recognize the sequential stream and read ahead. SAS/SCSI/FC will
>> probably work the same way with different IOP rates.
>
> Yeah, probabaly right. I thought that 10GbE give the IOMMU more
> workloads than SSD does and tried to emulate something like that.

10GbE might exercise a different code path. NICs typically use map_single
and storage devices typically use map_sg.  But they both exercise the same
underlying resource management code since it's the same IOMMU they poke at.

...
>> Sorry, I didn't see a replacement for the deferred_flush_tables.
>> Mark Gross and I agree this substantially helps with unmap performance.
>> See http://lkml.org/lkml/2008/3/3/373
>
> Yeah, I can add a nice trick in parisc sba_iommu uses. I'll try next
> time.
>
> But it probably gives the bitmap method less gain than the RB tree
> since clear the bitmap takes less time than changing the tree.
>
> The deferred_flush_tables also batches flushing TLB. The patch flushes
> TLB only when it reaches the end of the bitmap (it's a trick that some
> IOMMUs like SPARC does).

The batching of the TLB flushes is the key thing. I was being paranoid
by not marking the resource free until after the TLB was flushed. If we
know the allocation is going to be circular through the bitmap, flushing
the TLB once per iteration through the bitmap should be sufficient since
we can guarantee the IO Pdir resource won't get re-used until a full
cycle through the bitmap has been completed.

I expect this will work for parisc too and I can test that. Funny that didn't
"click" with me when I original wrote the parisc code. DaveM had even told
me the SPARC code was only flushing the IOTLB once per iteration.

...
> Agreed. VT-d can handle DMA virtual address space larger than 32 bits
> but it means that we need more memory for the bitmap. I think that the
> majority of systems don't need DMA virtual address space larger than
> 32 bits. Making it as a kernel parameter is a reasonable approach, I
> think.

Agreed. It needs a resonable default and a way to change it at runtime
for odd cases.

...
>> "32-PAGE_SHIFT_4K" expression is used in several places but I didn't see
>> an explanation of why 32. Can you add one someplace?
>
> OK, I'll do next time. Most of them are about 4GB virtual address
> space that the patch uses.

thanks! The comment should then explain why 4GB is "reasonable" (vs
1GB for example).

...
> Thanks a lot! I didn't expect this patch to be reviewed. I really
> appreciate it.

very welcome,
grant
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/