[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20081001071956.GA27826@amd.com>
Date: Wed, 1 Oct 2008 09:19:56 +0200
From: Joerg Roedel <joerg.roedel@....com>
To: Muli Ben-Yehuda <muli@...ibm.com>
CC: FUJITA Tomonori <fujita.tomonori@....ntt.co.jp>, joro@...tes.org,
amit.shah@...hat.com, linux-kernel@...r.kernel.org,
kvm@...r.kernel.org, iommu@...ts.linux-foundation.org,
dwmw2@...radead.org, mingo@...hat.com
Subject: Re: [PATCH 9/9] x86/iommu: use dma_ops_list in get_dma_ops
On Tue, Sep 30, 2008 at 10:44:01PM +0300, Muli Ben-Yehuda wrote:
> On Mon, Sep 29, 2008 at 03:33:11PM +0200, Joerg Roedel wrote:
>
> > > Nobody cares about the performance of dma_alloc_coherent. Only the
> > > performance of map_single/map_sg matters.
> > >
> > > I'm not sure how expensive the hypercalls are, but they are more
> > > expensive than bounce buffering coping lots of data for every
> > > I/Os?
> >
> > I don't think that we can avoid bounce buffering into the guests at
> > all (with and without my idea of a paravirtualized IOMMU) when we
> > want to handle dma_masks and requests that cross guest physical
> > pages properly.
>
> It might be possible to have a per-device slow or fast path, where the
> fast path is for devices which have no DMA limitations (high-end
> devices generally don't) and the slow path is for devices which do.
This solves the problem with the DMA masks. But what happens to requests
that cross guest page boundarys?
> > With mapping/unmapping through hypercalls we add the world-switch
> > overhead to the copy-overhead. We can't avoid this when we have no
> > hardware support at all. But already with older IOMMUs like Calgary
> > and GART we can at least avoid the world-switch. And since, for
> > example, every 64 bit capable AMD processor has a GART we can make
> > use of it.
>
> It should be possible to reduce the number and overhead of hypercalls
> to the point where their cost is immaterial. I think that's
> fundamentally a better approach.
Ok, we can queue map_sg allocations together an queue them into one
hypercall. But I remember a paper from you where you wrote that most
allocations are mapping only one area. Are there other ways to optimize
this? I must say that reducing the number of hypercalls was important
while thinking about my idea. If there are better ways I am all ears to
hear from them.
Joerg
--
| AMD Saxony Limited Liability Company & Co. KG
Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
System | Register Court Dresden: HRA 4896
Research | General Partner authorized to represent:
Center | AMD Saxony LLC (Wilmington, Delaware, US)
| General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists