[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALiNf29_PmLJTVLksSHp3NFAaL52idqehSMOtatJ=jaM2Muq1g@mail.gmail.com>
Date: Tue, 12 Jan 2021 15:48:16 +0800
From: Claire Chang <tientzu@...omium.org>
To: Florian Fainelli <f.fainelli@...il.com>
Cc: Rob Herring <robh+dt@...nel.org>, mpe@...erman.id.au,
benh@...nel.crashing.org, paulus@...ba.org,
"list@....net:IOMMU DRIVERS" <iommu@...ts.linux-foundation.org>,
Joerg Roedel <joro@...tes.org>, will@...nel.org,
Frank Rowand <frowand.list@...il.com>,
Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
boris.ostrovsky@...cle.com, jgross@...e.com,
sstabellini@...nel.org, Christoph Hellwig <hch@....de>,
Marek Szyprowski <m.szyprowski@...sung.com>,
Robin Murphy <robin.murphy@....com>, grant.likely@....com,
xypron.glpk@....de, Thierry Reding <treding@...dia.com>,
mingo@...nel.org, bauerman@...ux.ibm.com, peterz@...radead.org,
Greg KH <gregkh@...uxfoundation.org>,
Saravana Kannan <saravanak@...gle.com>,
rafael.j.wysocki@...el.com, heikki.krogerus@...ux.intel.com,
Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
rdunlap@...radead.org, dan.j.williams@...el.com,
Bartosz Golaszewski <bgolaszewski@...libre.com>,
linux-devicetree <devicetree@...r.kernel.org>,
lkml <linux-kernel@...r.kernel.org>,
linuxppc-dev@...ts.ozlabs.org, xen-devel@...ts.xenproject.org,
Tomasz Figa <tfiga@...omium.org>,
Nicolas Boichat <drinkcat@...omium.org>,
Jim Quinlan <james.quinlan@...adcom.com>
Subject: Re: [RFC PATCH v3 0/6] Restricted DMA
On Fri, Jan 8, 2021 at 1:59 AM Florian Fainelli <f.fainelli@...il.com> wrote:
>
> On 1/7/21 9:42 AM, Claire Chang wrote:
>
> >> Can you explain how ATF gets involved and to what extent it does help,
> >> besides enforcing a secure region from the ARM CPU's perpsective? Does
> >> the PCIe root complex not have an IOMMU but can somehow be denied access
> >> to a region that is marked NS=0 in the ARM CPU's MMU? If so, that is
> >> still some sort of basic protection that the HW enforces, right?
> >
> > We need the ATF support for memory MPU (memory protection unit).
> > Restricted DMA (with reserved-memory in dts) makes sure the predefined memory
> > region is for PCIe DMA only, but we still need MPU to locks down PCIe access to
> > that specific regions.
>
> OK so you do have a protection unit of some sort to enforce which region
> in DRAM the PCIE bridge is allowed to access, that makes sense,
> otherwise the restricted DMA region would only be a hint but nothing you
> can really enforce. This is almost entirely analogous to our systems then.
Here is the example of setting the MPU:
https://github.com/ARM-software/arm-trusted-firmware/blob/master/plat/mediatek/mt8183/drivers/emi_mpu/emi_mpu.c#L132
>
> There may be some value in standardizing on an ARM SMCCC call then since
> you already support two different SoC vendors.
>
> >
> >>
> >> On Broadcom STB SoCs we have had something similar for a while however
> >> and while we don't have an IOMMU for the PCIe bridge, we do have a a
> >> basic protection mechanism whereby we can configure a region in DRAM to
> >> be PCIe read/write and CPU read/write which then gets used as the PCIe
> >> inbound region for the PCIe EP. By default the PCIe bridge is not
> >> allowed access to DRAM so we must call into a security agent to allow
> >> the PCIe bridge to access the designated DRAM region.
> >>
> >> We have done this using a private CMA area region assigned via Device
> >> Tree, assigned with a and requiring the PCIe EP driver to use
> >> dma_alloc_from_contiguous() in order to allocate from this device
> >> private CMA area. The only drawback with that approach is that it
> >> requires knowing how much memory you need up front for buffers and DMA
> >> descriptors that the PCIe EP will need to process. The problem is that
> >> it requires driver modifications and that does not scale over the number
> >> of PCIe EP drivers, some we absolutely do not control, but there is no
> >> need to bounce buffer. Your approach scales better across PCIe EP
> >> drivers however it does require bounce buffering which could be a
> >> performance hit.
> >
> > Only the streaming DMA (map/unmap) needs bounce buffering.
>
> True, and typically only on transmit since you don't really control
> where the sk_buff are allocated from, right? On RX since you need to
> hand buffer addresses to the WLAN chip prior to DMA, you can allocate
> them from a pool that already falls within the restricted DMA region, right?
>
Right, but applying bounce buffering to RX will make it more secure.
The device won't be able to modify the content after unmap. Just like what
iommu_unmap does.
> > I also added alloc/free support in this series
> > (https://lore.kernel.org/patchwork/patch/1360995/), so dma_direct_alloc() will
> > try to allocate memory from the predefined memory region.
> >
> > As for the performance hit, it should be similar to the default swiotlb.
> > Here are my experiment results. Both SoCs lack IOMMU for PCIe.
> >
> > PCIe wifi vht80 throughput -
> >
> > MTK SoC tcp_tx tcp_rx udp_tx udp_rx
> > w/o Restricted DMA 244.1 134.66 312.56 350.79
> > w/ Restricted DMA 246.95 136.59 363.21 351.99
> >
> > Rockchip SoC tcp_tx tcp_rx udp_tx udp_rx
> > w/o Restricted DMA 237.87 133.86 288.28 361.88
> > w/ Restricted DMA 256.01 130.95 292.28 353.19
>
> How come you get better throughput with restricted DMA? Is it because
> doing DMA to/from a contiguous region allows for better grouping of
> transactions from the DRAM controller's perspective somehow?
I'm not sure, but actually, enabling the default swiotlb for wifi also helps the
throughput a little bit for me.
>
> >
> > The CPU usage doesn't increase too much either.
> > Although I didn't measure the CPU usage very precisely, it's ~3% with a single
> > big core (Cortex-A72) and ~5% with a single small core (Cortex-A53).
> >
> > Thanks!
> >
> >>
> >> Thanks!
> >> --
> >> Florian
>
>
> --
> Florian
Powered by blists - more mailing lists