lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1432264468.5092.92.camel@mhfsdcap03>
Date:	Fri, 22 May 2015 11:14:28 +0800
From:	Yong Wu <yong.wu@...iatek.com>
To:	Robin Murphy <robin.murphy@....com>
CC:	Matthias Brugger <matthias.bgg@...il.com>,
	Will Deacon <Will.Deacon@....com>,
	Daniel Kurtz <djkurtz@...gle.com>,
	Tomasz Figa <tfiga@...gle.com>,
	Lucas Stach <l.stach@...gutronix.de>,
	Mark Rutland <Mark.Rutland@....com>,
	Catalin Marinas <Catalin.Marinas@....com>,
	"linux-mediatek@...ts.infradead.org" 
	<linux-mediatek@...ts.infradead.org>,
	Sasha Hauer <kernel@...gutronix.de>,
	"srv_heupstream@...iatek.com" <srv_heupstream@...iatek.com>,
	"devicetree@...r.kernel.org" <devicetree@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-arm-kernel@...ts.infradead.org" 
	<linux-arm-kernel@...ts.infradead.org>,
	"iommu@...ts.linux-foundation.org" <iommu@...ts.linux-foundation.org>,
	"pebolle@...cali.nl" <pebolle@...cali.nl>,
	"arnd@...db.de" <arnd@...db.de>,
	"mitchelh@...eaurora.org" <mitchelh@...eaurora.org>,
	"k.zhang@...iatek.com" <k.zhang@...iatek.com>,
	"youhua.li@...iatek.com" <youhua.li@...iatek.com>,
	Joerg Roedel <joro@...tes.org>,
	Rob Herring <robh+dt@...nel.org>
Subject: Re: [PATCH v2 3/6] iommu: add ARM short descriptor page table
 allocator.

Hi Robin,
    Thanks very much for so detail suggestion.
    please also help check my comment, the others i will change in next
time.
On Fri, 2015-05-15 at 16:30 +0100, Robin Murphy wrote:
> Oops, seems I'm rather behind on things - I started this review on the 
> RFC, but I'll finish it here...
> 
> On 15/05/15 10:43, Yong Wu wrote:
> > This patch is for ARM Short Descriptor Format.It has 2-levels
> > pagetable and the allocator supports 4K/64K/1M/16M.
> >
> 
>  From the look of the code, this doesn't fully support partial unmaps 
> (i.e. splitting block entries), am I right? That's OK for DMA-API use, 
> since that doesn't permit partial unmaps anyway, but I'd say it's worth 
> making it clear that that's still a TODO in order for short-descriptor 
> mappings to fully support arbitrary raw IOMMU API usage.
Yes. I don't add split right now due to I check that
iommu_map/iommu_unmap make sure iova|pa be aligned.

I will add split for fully support in next version. Thanks.
> 
> > Signed-off-by: Yong Wu <yong.wu@...iatek.com>
> > ---
> >   drivers/iommu/Kconfig                |   7 +
> >   drivers/iommu/Makefile               |   1 +
> >   drivers/iommu/io-pgtable-arm-short.c | 490 +++++++++++++++++++++++++++++++++++
> >   drivers/iommu/io-pgtable.c           |   4 +
> >   drivers/iommu/io-pgtable.h           |   6 +
> >   5 files changed, 508 insertions(+)
> >   create mode 100644 drivers/iommu/io-pgtable-arm-short.c
> >
> > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> > index 1ae4e54..3d2eac6 100644
> > --- a/drivers/iommu/Kconfig
> > +++ b/drivers/iommu/Kconfig
> > @@ -39,6 +39,13 @@ config IOMMU_IO_PGTABLE_LPAE_SELFTEST
> >
> >            If unsure, say N here.
> >
> > +config IOMMU_IO_PGTABLE_SHORT
> > +       bool "ARMv7/v8 Short Descriptor Format"
> > +       select IOMMU_IO_PGTABLE
> > +       help
> > +         Enable support for the ARM Short descriptor pagetable format.
> > +         It has 2-levels pagetable and The allocator supports 4K/64K/1M/16M.
> > +
> >   endmenu
> >
> >   config IOMMU_IOVA
> > diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> > index 080ffab..815b3c8 100644
> > --- a/drivers/iommu/Makefile
> > +++ b/drivers/iommu/Makefile
> > @@ -3,6 +3,7 @@ obj-$(CONFIG_IOMMU_API) += iommu-traces.o
> >   obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
> >   obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
> >   obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
> > +obj-$(CONFIG_IOMMU_IO_PGTABLE_SHORT) += io-pgtable-arm-short.o
> >   obj-$(CONFIG_IOMMU_IOVA) += iova.o
> >   obj-$(CONFIG_OF_IOMMU) += of_iommu.o
> >   obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o msm_iommu_dev.o
> > diff --git a/drivers/iommu/io-pgtable-arm-short.c b/drivers/iommu/io-pgtable-arm-short.c
> > new file mode 100644
> > index 0000000..cc286ce5
> > --- /dev/null
> > +++ b/drivers/iommu/io-pgtable-arm-short.c
> > @@ -0,0 +1,490 @@
> > +/*
> > + * Copyright (c) 2014-2015 MediaTek Inc.
> > + * Author: Yong Wu <yong.wu@...iatek.com>
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License version 2 as
> > + * published by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + */
> > +#define pr_fmt(fmt)    "arm-short-desc io-pgtable: "fmt
> > +
> > +#include <linux/err.h>
> > +#include <linux/mm.h>
> > +#include <linux/iommu.h>
> > +#include <linux/errno.h>
> 
> Alphabetically-sorted includes, please. Also, this list doesn't look 
> particularly correct - e.g. I don't think you're actually using anything 
> from mm.h, but you are relying on stuff from kernel.h, slab.h, gfp.h, 
> etc. being pulled in indirectly.
> 
> > +
> > +#include "io-pgtable.h"
> > +
> > +typedef u32 arm_short_iopte;
> > +
> > +struct arm_short_io_pgtable {
> > +       struct io_pgtable       iop;
> > +       struct kmem_cache       *ptekmem;
> > +       size_t                  pgd_size;
> > +       void                    *pgd;
> > +};
> > +
> > +#define io_pgtable_short_to_data(x)                            \
> > +       container_of((x), struct arm_short_io_pgtable, iop)
> > +
> > +#define io_pgtable_ops_to_pgtable(x)                           \
> > +       container_of((x), struct io_pgtable, ops)
> 
> This macro may as well be factored out into io-pgtable.h before 
> duplication spreads any further. I don't see any reason for it not to 
> live alongside the definition of struct io_pgtable, anyway.
Thanks. I will move it into io-pgtable.h.
Then I think this also should be deleted in io-pgtable-arm.c.
> 
> > +
> > +#define io_pgtable_short_ops_to_data(x)                                \
> > +       io_pgtable_short_to_data(io_pgtable_ops_to_pgtable(x))
> > +
> > +#define ARM_SHORT_MAX_ADDR_BITS                        32
> > +
> > +#define ARM_SHORT_PGDIR_SHIFT                  20
> > +#define ARM_SHORT_PAGE_SHIFT                   12
> > +#define ARM_SHORT_PTRS_PER_PTE                 256
> > +#define ARM_SHORT_BYTES_PER_PTE                        1024
> > +
> > +/* 1 level pagetable */
> > +#define ARM_SHORT_F_PGD_TYPE_PAGE              (0x1)
> > +#define ARM_SHORT_F_PGD_TYPE_PAGE_MSK          (0x3)
> > +#define ARM_SHORT_F_PGD_TYPE_SECTION           (0x2)
> > +#define ARM_SHORT_F_PGD_TYPE_SUPERSECTION      (0x2 | (1 << 18))
> > +#define ARM_SHORT_F_PGD_TYPE_SECTION_MSK       (0x3 | (1 << 18))
> > +#define ARM_SHORT_F_PGD_TYPE_IS_PAGE(pgd)      (((pgd) & 0x3) == 1)
> 
> This confused me on first glance looking at the places it's used, 
> because it's not actually referring to a thing which is a page. Maybe 
> ..._IS_TABLE would be a better name?
Yes. It is better. From the spec, it is "page table".
Then How about "ARM_SHORT_F_PGD_TYPE_IS_PAGETABLE"?
It is a little long, but there are only 2 lines use it and Both are not
over 80 character even though "_IS_PAGETABLE".
> 
> > +#define ARM_SHORT_F_PGD_TYPE_IS_SECTION(pgd)           \
> > +       (((pgd) & ARM_SHORT_F_PGD_TYPE_SECTION_MSK)     \
> > +               == ARM_SHORT_F_PGD_TYPE_SECTION)
> > +#define ARM_SHORT_F_PGD_TYPE_IS_SUPERSECTION(pgd)      \
> > +       (((pgd) & ARM_SHORT_F_PGD_TYPE_SECTION_MSK)     \
> > +               == ARM_SHORT_F_PGD_TYPE_SUPERSECTION)
> > +
> > +#define ARM_SHORT_F_PGD_B_BIT                  BIT(2)
> > +#define ARM_SHORT_F_PGD_C_BIT                  BIT(3)
> > +#define ARM_SHORT_F_PGD_IMPLE_BIT              BIT(9)
> > +#define ARM_SHORT_F_PGD_S_BIT                  BIT(16)
> > +#define ARM_SHORT_F_PGD_NG_BIT                 BIT(17)
> > +#define ARM_SHORT_F_PGD_NS_BIT_PAGE            BIT(3)
> > +#define ARM_SHORT_F_PGD_NS_BIT_SECTION         BIT(19)
> > +
> > +#define ARM_SHORT_F_PGD_PA_PAGETABLE_MSK       0xfffffc00
> > +#define ARM_SHORT_F_PGD_PA_SECTION_MSK         0xfff00000
> > +#define ARM_SHORT_F_PGD_PA_SUPERSECTION_MSK    0xff000000
> > +
> > +/* 2 level pagetable */
> > +#define ARM_SHORT_F_PTE_TYPE_GET(val)          ((val) & 0x3)
> > +#define ARM_SHORT_F_PTE_TYPE_LARGE             BIT(0)
> > +#define ARM_SHORT_F_PTE_TYPE_SMALL             BIT(1)
> > +#define ARM_SHORT_F_PTE_B_BIT                  BIT(2)
> > +#define ARM_SHORT_F_PTE_C_BIT                  BIT(3)
> > +#define ARM_SHORT_F_PTE_IMPLE_BIT              BIT(9)
> > +#define ARM_SHORT_F_PTE_S_BIT                  BIT(10)
> > +#define ARM_SHORT_F_PTE_PA_LARGE_MSK            0xffff0000
> > +#define ARM_SHORT_F_PTE_PA_SMALL_MSK            0xfffff000
> > +
> > +#define ARM_SHORT_PGD_IDX(a)                   ((a) >> ARM_SHORT_PGDIR_SHIFT)
> > +#define ARM_SHORT_PTE_IDX(a)                   \
> > +       (((a) >> ARM_SHORT_PAGE_SHIFT) & 0xff)
> > +#define ARM_SHORT_GET_PTE_VA(pgd)              \
> > +       (phys_to_virt((unsigned long)pgd & ARM_SHORT_F_PGD_PA_PAGETABLE_MSK))
> > +
> > +static arm_short_iopte *
> > +arm_short_get_pte_in_pgd(arm_short_iopte curpgd, unsigned int iova)
> > +{
> > +       arm_short_iopte *pte;
> > +
> > +       pte = ARM_SHORT_GET_PTE_VA(curpgd);
> > +       pte += ARM_SHORT_PTE_IDX(iova);
> > +       return pte;
> > +}
> > +
> > +static arm_short_iopte *
> > +arm_short_supersection_start(arm_short_iopte *pgd)
> > +{
> > +       return (arm_short_iopte *)(round_down((unsigned long)pgd, (16 * 4)));
> > +}
> > +
> > +static int _arm_short_check_free_pte(struct arm_short_io_pgtable *data,
> > +                                    arm_short_iopte *pgd)
> 
> Given that this is only returning success/failure, it should probably be 
> bool rather than int.
Thanks.
> 
> > +{
> > +       arm_short_iopte *pte;
> > +       int i;
> > +
> > +       pte = ARM_SHORT_GET_PTE_VA(*pgd);
> > +
> > +       for (i = 0; i < ARM_SHORT_PTRS_PER_PTE; i++) {
> > +               if (pte[i] != 0)
> > +                       return 1;
> > +       }
> > +
> > +       /* Free PTE */
> > +       kmem_cache_free(data->ptekmem, pte);
> > +       *pgd = 0;
> > +
> > +       return 0;
> > +}
> > +
> > +static phys_addr_t arm_short_iova_to_phys(struct io_pgtable_ops *ops,
> > +                                         unsigned long iova)
> > +{
> > +       struct arm_short_io_pgtable *data = io_pgtable_short_ops_to_data(ops);
> > +       arm_short_iopte *pte, *pgd = data->pgd;
> > +       phys_addr_t pa = 0;
> > +
> > +       pgd += ARM_SHORT_PGD_IDX(iova);
> > +
> > +       if (ARM_SHORT_F_PGD_TYPE_IS_PAGE(*pgd)) {
> > +               u8 pte_type;
> > +
> > +               pte = arm_short_get_pte_in_pgd(*pgd, iova);
> > +               pte_type = ARM_SHORT_F_PTE_TYPE_GET(*pte);
> > +
> > +               if (pte_type == ARM_SHORT_F_PTE_TYPE_LARGE) {
> > +                       pa = (*pte) & ARM_SHORT_F_PTE_PA_LARGE_MSK;
> > +                       pa |= iova & (~ARM_SHORT_F_PTE_PA_LARGE_MSK);
> > +               } else if (pte_type == ARM_SHORT_F_PTE_TYPE_SMALL) {
> > +                       pa = (*pte) & ARM_SHORT_F_PTE_PA_SMALL_MSK;
> > +                       pa |= iova & (~ARM_SHORT_F_PTE_PA_SMALL_MSK);
> > +               }
> > +       } else {
> > +               if (ARM_SHORT_F_PGD_TYPE_IS_SECTION(*pgd)) {
> > +                       pa = (*pgd) & ARM_SHORT_F_PGD_PA_SECTION_MSK;
> > +                       pa |= iova & (~ARM_SHORT_F_PGD_PA_SECTION_MSK);
> > +               } else if (ARM_SHORT_F_PGD_TYPE_IS_SUPERSECTION(*pgd)) {
> > +                       pa = (*pgd) & ARM_SHORT_F_PGD_PA_SUPERSECTION_MSK;
> > +                       pa |= iova & (~ARM_SHORT_F_PGD_PA_SUPERSECTION_MSK);
> > +               }
> > +       }
> > +
> > +       return pa;
> > +}
> > +
> > +static int arm_short_unmap(struct io_pgtable_ops *ops, unsigned long iova,
> > +                          size_t size)
> > +{
> > +       struct arm_short_io_pgtable *data = io_pgtable_short_ops_to_data(ops);
> > +       arm_short_iopte *pgd;
> > +       unsigned long iova_start = iova;
> > +       unsigned long long end_plus_1 = iova + size;
> 
> Since everything's at page granularity, working with IOVA PFNs rather 
> than raw addresses might be more convenient, and also sidesteps the 
> 32-bit overflow problem. On 64-bit platforms, we're wasting a whole 95 
> bits of a long long here ;)
About IOVA PFNs, if add it, we have to include "iova.h". then it may add
a new relationship with other module. I am not sure it is ok.
in pg-iotable-arm.c, I also don't see it. so I don't prepare to add iova
pfn, is it ok?

iova here always is 32bit, and iova+size may over 32bit. so I use "long
long" here. "long long" always is 64bit in 32bit&64bit platform?
"long" may be error while 32bit platform. 

I will add split and try to delete this in next version.
> 
> > +       const struct iommu_gather_ops *tlb = data->iop.cfg.tlb;
> > +       void *cookie = data->iop.cookie;
> > +       int ret;
> > +
> > +       do {
> > +               pgd = (arm_short_iopte *)data->pgd + ARM_SHORT_PGD_IDX(iova);
> > +
> > +               if (ARM_SHORT_F_PGD_TYPE_IS_PAGE(*pgd)) {
> > +                       arm_short_iopte *pte;
> > +                       unsigned int pte_offset;
> > +                       unsigned int num_to_clean;
> > +
> > +                       pte_offset = ARM_SHORT_PTE_IDX(iova);
> > +                       num_to_clean =
> > +                           min((unsigned int)((end_plus_1 - iova) / PAGE_SIZE),
> 
> Shouldn't this be page size for the IOMMU, not the CPU? I'm a bit slow 
> today, but this looks like it might go wrong when PAGE_SIZE > 4K.
Thanks. Then I will add a define like this:
#define IOMMU_PAGE_SIZE_4K  SZ_4K
And I will put it into io-pgtable.h,  the io-pgtable-arm.c may also need
it.
> 
> > +                               (ARM_SHORT_PTRS_PER_PTE - pte_offset));
> > +
> > +                       pte = arm_short_get_pte_in_pgd(*pgd, iova);
> > +
> > +                       memset(pte, 0, num_to_clean * sizeof(arm_short_iopte));
> > +
> > +                       ret = _arm_short_check_free_pte(data, pgd);
> > +                       if (ret == 1)/* pte is not freed, need to flush pte */
> > +                               tlb->flush_pgtable(
> > +                                       pte,
> > +                                       num_to_clean * sizeof(arm_short_iopte),
> > +                                       cookie);
> > +                       else
> > +                               tlb->flush_pgtable(pgd, sizeof(arm_short_iopte),
> > +                                                  cookie);
> > +
> > +                       iova += num_to_clean << PAGE_SHIFT;
> > +               } else if (ARM_SHORT_F_PGD_TYPE_IS_SECTION(*pgd)) {
> > +                       *pgd = 0;
> > +
> > +                       tlb->flush_pgtable(pgd, sizeof(arm_short_iopte),
> > +                                          cookie);
> > +                       iova += SZ_1M;
> > +               } else if (ARM_SHORT_F_PGD_TYPE_IS_SUPERSECTION(*pgd)) {
> > +                       arm_short_iopte *start;
> > +
> > +                       start = arm_short_supersection_start(pgd);
> > +                       if (unlikely(start != pgd))
> > +                               pr_warn("%s:suppersection start isn't aligned.iova=0x%lx,pgd=0x%x\n",
> 
> Nit: typo in "supersection" here.
> 
> > +                                       __func__, iova, *pgd);
> > +
> > +                       memset(start, 0, 16 * sizeof(arm_short_iopte));
> > +
> > +                       tlb->flush_pgtable(start, 16 * sizeof(arm_short_iopte),
> > +                                          cookie);
> > +
> > +                       iova = (iova + SZ_16M) & (~(SZ_16M - 1));
> 
> iova = ALIGN(iova + SZ_16M, SZ_16M);
> 
> Except that being unaligned in the first place is an error condition, so 
> it would make more sense to just have "iova += SZ_16M;" here, and put 
> "iova = ALIGN(iova, SZ_16M) after the warning in the error path above.
> 
> Since you don't handle splitting block mappings, and you also seem to be 
> missing an equivalent warning for unaligned second-level large pages, it 
> might be better to simply return an error if the requested size and 
> alignment don't exactly match the type of entry found, rather than let 
> the page tables get into an unexpectedly inconsistent state.
OK. I will add split and add the align checking of start_iova for each
case..
> 
> > +               } else {
> > +                       break;
> > +               }
> > +       } while (iova < end_plus_1 && iova);
> > +
> > +       tlb->tlb_add_flush(iova_start, size, true, cookie);
> > +
> > +       return 0;
> > +}
> > +
> > +static arm_short_iopte __arm_short_pte_port(unsigned int prot, bool large)
> 
> I assume _port is a typo of _prot
> 
> > +{
> > +       arm_short_iopte pteprot;
> > +
> > +       pteprot = ARM_SHORT_F_PTE_S_BIT;
> > +
> > +       pteprot |= large ? ARM_SHORT_F_PTE_TYPE_LARGE :
> > +                               ARM_SHORT_F_PTE_TYPE_SMALL;
> > +
> > +       if (prot & IOMMU_CACHE)
> > +               pteprot |=  ARM_SHORT_F_PTE_B_BIT | ARM_SHORT_F_PTE_C_BIT;
> > +
> > +       return pteprot;
> > +}
> > +
> > +static arm_short_iopte __arm_short_pgd_port(int prot, bool super)
> 
> Ditto
> 
> > +{
> > +       arm_short_iopte pgdprot;
> > +
> > +       pgdprot = ARM_SHORT_F_PGD_S_BIT;
> > +       pgdprot |= super ? ARM_SHORT_F_PGD_TYPE_SUPERSECTION :
> > +                               ARM_SHORT_F_PGD_TYPE_SECTION;
> > +       if (prot & IOMMU_CACHE)
> > +               pgdprot |= ARM_SHORT_F_PGD_C_BIT | ARM_SHORT_F_PGD_B_BIT;
> > +
> > +       return pgdprot;
> > +}
> > +
> > +static int _arm_short_map_page(struct arm_short_io_pgtable *data,
> > +                              unsigned int iova, phys_addr_t pa,
> > +                              unsigned int prot, bool largepage)
> > +{
> > +       arm_short_iopte *pgd = data->pgd;
> > +       arm_short_iopte *pte;
> > +       arm_short_iopte pgdprot, pteprot;
> > +       arm_short_iopte mask = largepage ? ARM_SHORT_F_PTE_PA_LARGE_MSK :
> > +                                               ARM_SHORT_F_PTE_PA_SMALL_MSK;
> > +       int i, ptenum = largepage ? 16 : 1;
> > +       bool ptenew = false;
> > +       void *pte_new_va;
> > +       void *cookie = data->iop.cookie;
> > +
> > +       if ((iova | pa) & (~mask)) {
> > +               pr_err("IOVA|PA Not Aligned(iova=0x%x pa=0x%pa type=%s)\n",
> > +                      iova, &pa, largepage ? "large page" : "small page");
> 
> Nit: you may as well just have largepage ?  "large" : "small" here and 
> "...type=%s page..." in the format string.
> 
> > +               return -EINVAL;
> > +       }
> > +
> > +       pgdprot = ARM_SHORT_F_PGD_TYPE_PAGE;
> > +       if (data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_NS)
> > +               pgdprot |= ARM_SHORT_F_PGD_NS_BIT_PAGE;
> > +
> > +       pgd += ARM_SHORT_PGD_IDX(iova);
> > +
> > +       if (!(*pgd)) {
> > +               pte_new_va = kmem_cache_zalloc(data->ptekmem, GFP_KERNEL);
> > +               if (unlikely(!pte_new_va)) {
> > +                       pr_err("Failed to alloc pte\n");
> 
> The allocator should already print the details of a failure, so I don't 
> think this adds much.
> 
> > +                       return -ENOMEM;
> > +               }
> > +
> > +               /* Check pte alignment -- must 1K align */
> > +               if (unlikely((unsigned long)pte_new_va &
> > +                            (ARM_SHORT_BYTES_PER_PTE - 1))) {
> > +                       pr_err("The new pte is not aligned! (va=0x%p)\n",
> > +                              pte_new_va);
> > +                       kmem_cache_free(data->ptekmem, (void *)pte_new_va);
> > +                       return -ENOMEM;
> > +               }
> 
> Can this ever actually happen? Besides, if kmem_cache_alloc isn't 
> honouring the alignment specified in kmem_cache_create, I think the 
> kernel might have bigger problems anyway...
Thanks. I will delete it.
> 
> > +               ptenew = true;
> > +               *pgd = virt_to_phys(pte_new_va) | pgdprot;
> > +               kmemleak_ignore(pte_new_va);
> > +               data->iop.cfg.tlb->flush_pgtable(pgd, sizeof(arm_short_iopte),
> > +                                                cookie);
> > +       } else {
> > +               /* Someone else may have allocated for this pgd */
> > +               if (((*pgd) & (~ARM_SHORT_F_PGD_PA_PAGETABLE_MSK)) != pgdprot) {
> > +                       pr_err("The prot of old pgd is not Right!iova=0x%x pgd=0x%x pgprot=0x%x\n",
> > +                              iova, (*pgd), pgdprot);
> > +                       return -EEXIST;
> > +               }
> > +       }
> > +
> > +       pteprot = (arm_short_iopte)pa;
> > +       pteprot |= __arm_short_pte_port(prot, largepage);
> > +
[snip]
> > +static int arm_short_map(struct io_pgtable_ops *ops, unsigned long iova,
> > +                        phys_addr_t paddr, size_t size, int prot)
> > +{
> > +       struct arm_short_io_pgtable *data = io_pgtable_short_ops_to_data(ops);
> > +       const struct iommu_gather_ops *tlb = data->iop.cfg.tlb;
> > +       int ret;
> > +
> > +       if (!(prot & (IOMMU_READ | IOMMU_WRITE)))
> > +               return -EINVAL;
> > +
> > +       if (size == SZ_4K) {/* most case */
> > +               ret = _arm_short_map_page(data, iova, paddr, prot, false);
> > +       } else if (size == SZ_64K) {
> > +               ret = _arm_short_map_page(data, iova, paddr, prot, true);
> > +       } else if (size == SZ_1M) {
> > +               ret = _arm_short_map_section(data, iova, paddr, prot, false);
> > +       } else if (size == SZ_16M) {
> > +               ret = _arm_short_map_section(data, iova, paddr, prot, true);
> > +       } else {
> > +               ret = -EINVAL;
> > +       }
> 
> I think this might be nicer as a switch statement. You could perhaps 
> move the add_flush beforehand (since it's unconditional here anyway) and 
> get rid of ret altogether.
move the add_flush before map?
if we change the pagetable firstly, then add_flush. is it better?
> 
> Alternatively, given that map_page and map_section are so similar, maybe 
> it's worth sorting out the pgprot and pgd/pte pointer for the 
> page/section distinction here, then just passing those to a single 
> function which maps compound/non-compound leaf entries.
Ok. I will try to merge them in one function like _arm_short_map.
> 
> > +       tlb->tlb_add_flush(iova, size, true, data->iop.cookie);
> > +       return ret;
> > +}
> > +
> > +static struct io_pgtable *
> > +arm_short_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
> > +{
> > +       struct arm_short_io_pgtable *data;
> > +
> > +       if (cfg->ias != 32)
> > +               return NULL;
> > +
> > +       if (cfg->oas > ARM_SHORT_MAX_ADDR_BITS)
> > +               return NULL;
> > +
> > +       cfg->pgsize_bitmap &= SZ_4K | SZ_64K | SZ_1M | SZ_16M;
> > +
> > +       data = kzalloc(sizeof(*data), GFP_KERNEL);
> > +       if (!data)
> > +               return NULL;
> > +
> > +       data->pgd_size = SZ_16K;
> > +
> > +       data->pgd = alloc_pages_exact(data->pgd_size, GFP_KERNEL | __GFP_ZERO);
> 
> I think this needs __GFP_DMA too, to ensure the pgd lies below the 4GB 
> boundary on arm64/LPAE systems with memory above that.
Thanks.
> > @@ -62,6 +63,11 @@ struct io_pgtable_cfg {
> >                          u64     vttbr;
> >                          u64     vtcr;
> >                  } arm_lpae_s2_cfg;
> > +
> > +               struct {
> > +                       u64     ttbr[2];
> > +                       u64     tcr;
> 
> The ARM ARM defines these all as 32-bit registers for the 
> short-descriptor format, so I think u32 would be more appropriate - 
> better to let the compiler truncate things and warn about it, than have 
> the hardware do it silently at runtime ;)
I will change both of them to u32.
> 
> > +               } arm_short_cfg;
> >          };
> >   };
> >
> > --
> > 1.8.1.1.dirty
> >
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ