linux-kernel - Fwd: [PATCH v7 1/2] x86: Add Isolated Memory Regions for Quark X1000

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHp75VecMbiY3-KxNa0OzkUfLtH0TtPGik-eO-KkyQjHDcG3kQ@mail.gmail.com>
Date:	Fri, 30 Jan 2015 14:06:05 +0200
From:	Andy Shevchenko <andy.shevchenko@...il.com>
To:	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>,
	"H. Peter Anvin" <hpa@...or.com>,
	"x86@...nel.org" <x86@...nel.org>,
	"dvhart@...radead.org" <dvhart@...radead.org>,
	andy.shevchenko@...il.com,
	"Ong, Boon Leong" <boon.leong.ong@...el.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Fwd: [PATCH v7 1/2] x86: Add Isolated Memory Regions for Quark X1000

---------- Forwarded message ----------
From: Andy Shevchenko <andy.shevchenko@...il.com>
Date: Fri, Jan 30, 2015 at 2:03 PM
Subject: Re: [PATCH v7 1/2] x86: Add Isolated Memory Regions for Quark X1000
To: Bryan O'Donoghue <pure.logic@...us-software.ie>


On Fri, Jan 30, 2015 at 5:05 AM, Bryan O'Donoghue
<pure.logic@...us-software.ie> wrote:
> Intel's Quark X1000 SoC contains a set of registers called Isolated Memory
> Regions. IMRs are accessed over the IOSF mailbox interface. IMRs are areas
> carved out of memory that define read/write access rights to the various
> system agents within the Quark system. For a given agent in the system it is
> possible to specify if that agent may read or write an area of memory
> defined by an IMR with a granularity of 1 KiB.
>
> Quark_SecureBootPRM_330234_001.pdf section 4.5 details the concept of IMRs
> quark-x1000-datasheet.pdf section 12.7.4 details the implementation of IMRs
> in silicon.
>
> eSRAM flush, CPU Snoop write-only, CPU SMM Mode, CPU non-SMM mode, RMU and
> PCIe Virtual Channels (VC0 and VC1) can have individual read/write access
> masks applied to them for a given memory region in Quark X1000. This
> enables IMRs to treat each memory transaction type listed above on an
> individual basis and to filter appropriately based on the IMR access mask
> for the memory region. Quark supports eight IMRs.
>
> Since all of the DMA capable SoC components in the X1000 are mapped to VC0
> it is possible to define sections of memory as invalid for DMA write
> operations originating from Ethernet, USB, SD and any other DMA capable
> south-cluster component on VC0. Similarly it is possible to mark kernel
> memory as non-SMM mode read/write only or to mark BIOS runtime memory as SMM
> mode accessible only depending on the particular memory footprint on a given
> system.
>
> On an IMR violation Quark SoC X1000 systems are configured to reset the
> system, so ensuring that the IMR memory map is consistent with the EFI
> provided memory map is critical to ensure no IMR violations reset the
> system.
>
> The API for accessing IMRs is based on MTRR code but doesn't provide a /proc
> or /sys interface to manipulate IMRs. Defining the size and extent of IMRs
> is exclusively the domain of in-kernel code.
>
> Quark firmware sets up a series of locked IMRs around pieces of memory that
> firmware owns such as ACPI runtime data. During boot a series of unlocked
> IMRs are placed around items in memory to guarantee no DMA modification of
> those items can take place. Grub also places an unlocked IMR around the
> kernel boot params data structure and compressed kernel image. It is
> necessary for the kernel to tear down all unlocked IMRs in order to ensure
> that the kernel's view of memory passed via the EFI memory map is consistent
> with the IMR memory map. Without tearing down all unlocked IMRs on boot
> transitory IMRs such as those used to protect the compressed kernel image
> will cause IMR violations and system reboots.
>
> The IMR init code tears down all unlocked IMRs and sets a protective IMR
> around the kernel .text and .rodata as one contiguous block. This sanitizes
> the IMR memory map with respect to the EFI memory map and protects the
> read-only portions of the kernel from unwarranted DMA access.
>

Few comments and after addressing them

Reviewed-by: Andy Shevchenko <andy.shevchenko@...il.com>

> Signed-off-by: Bryan O'Donoghue <pure.logic@...us-software.ie>
> ---
>  arch/x86/Kconfig.debug                       |  13 +
>  arch/x86/include/asm/imr.h                   |  60 +++
>  arch/x86/platform/intel-quark/Makefile       |   2 +
>  arch/x86/platform/intel-quark/imr.c          | 667 +++++++++++++++++++++++++++
>  arch/x86/platform/intel-quark/imr_selftest.c | 124 +++++
>  drivers/platform/x86/Kconfig                 |  25 +
>  6 files changed, 891 insertions(+)
>  create mode 100644 arch/x86/include/asm/imr.h
>  create mode 100644 arch/x86/platform/intel-quark/Makefile
>  create mode 100644 arch/x86/platform/intel-quark/imr.c
>  create mode 100644 arch/x86/platform/intel-quark/imr_selftest.c
>
> diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
> index 61bd2ad..20028da 100644
> --- a/arch/x86/Kconfig.debug
> +++ b/arch/x86/Kconfig.debug
> @@ -313,6 +313,19 @@ config DEBUG_NMI_SELFTEST
>
>           If unsure, say N.
>
> +config DEBUG_IMR_SELFTEST
> +       bool "Isolated Memory Region self test"
> +       default n
> +       depends on INTEL_IMR
> +       ---help---
> +         This option enables automated sanity testing of the IMR code.
> +         Some simple tests are run to verify IMR bounds checking, alignment
> +         and overlapping. This option is really only useful if you are
> +         debugging an IMR memory map or are modifying the IMR code and want to
> +         test your changes.
> +
> +         If unsure say N here.
> +
>  config X86_DEBUG_STATIC_CPU_HAS
>         bool "Debug alternatives"
>         depends on DEBUG_KERNEL
> diff --git a/arch/x86/include/asm/imr.h b/arch/x86/include/asm/imr.h
> new file mode 100644
> index 0000000..cd2ce40
> --- /dev/null
> +++ b/arch/x86/include/asm/imr.h
> @@ -0,0 +1,60 @@
> +/*
> + * imr.h: Isolated Memory Region API
> + *
> + * Copyright(c) 2013 Intel Corporation.
> + * Copyright(c) 2015 Bryan O'Donoghue <pure.logic@...us-software.ie>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; version 2
> + * of the License.
> + */
> +#ifndef _IMR_H
> +#define _IMR_H
> +
> +#include <linux/types.h>
> +
> +/*
> + * IMR agent access mask bits
> + * See section 12.7.4.7 from quark-x1000-datasheet.pdf for register
> + * definitions.
> + */
> +#define IMR_ESRAM_FLUSH                BIT(31)
> +#define IMR_CPU_SNOOP          BIT(30)         /* Applicable only to write */
> +#define IMR_RMU                        BIT(29)
> +#define IMR_VC1_SAI_ID3                BIT(15)
> +#define IMR_VC1_SAI_ID2                BIT(14)
> +#define IMR_VC1_SAI_ID1                BIT(13)
> +#define IMR_VC1_SAI_ID0                BIT(12)
> +#define IMR_VC0_SAI_ID3                BIT(11)
> +#define IMR_VC0_SAI_ID2                BIT(10)
> +#define IMR_VC0_SAI_ID1                BIT(9)
> +#define IMR_VC0_SAI_ID0                BIT(8)
> +#define IMR_CPU_0              BIT(1)          /* SMM mode */
> +#define IMR_CPU                        BIT(0)          /* Non SMM mode */
> +#define IMR_ACCESS_NONE                0
> +
> +/*
> + * Read/Write access-all bits here include some reserved bits
> + * These are the values firmware uses and are accepted by hardware.
> + * The kernel defines read/write access-all in the same way as firmware
> + * in order to have a consistent and crisp definition across firmware,
> + * bootloader and kernel.
> + */
> +#define IMR_READ_ACCESS_ALL    0xBFFFFFFF
> +#define IMR_WRITE_ACCESS_ALL   0xFFFFFFFF
> +
> +/* Number of IMRs provided by Quark X1000 SoC */
> +#define QUARK_X1000_IMR_MAX    0x08
> +#define QUARK_X1000_IMR_REGBASE 0x40
> +
> +/* IMR alignment bits - only bits 31:10 are checked for IMR validity */
> +#define IMR_ALIGN              0x400
> +#define IMR_MASK               (IMR_ALIGN - 1)
> +
> +int imr_add_range(phys_addr_t base, size_t size,
> +                 unsigned int rmask, unsigned int wmask, bool lock);
> +
> +int imr_remove_range(phys_addr_t base, size_t size);
> +
> +#endif /* _IMR_H */
> diff --git a/arch/x86/platform/intel-quark/Makefile b/arch/x86/platform/intel-quark/Makefile
> new file mode 100644
> index 0000000..9cc57ed
> --- /dev/null
> +++ b/arch/x86/platform/intel-quark/Makefile
> @@ -0,0 +1,2 @@
> +obj-$(CONFIG_INTEL_IMR) += imr.o
> +obj-$(CONFIG_DEBUG_IMR_SELFTEST) += imr_selftest.o
> diff --git a/arch/x86/platform/intel-quark/imr.c b/arch/x86/platform/intel-quark/imr.c
> new file mode 100644
> index 0000000..8319c30
> --- /dev/null
> +++ b/arch/x86/platform/intel-quark/imr.c
> @@ -0,0 +1,667 @@
> +/**
> + * imr.c
> + *
> + * Copyright(c) 2013 Intel Corporation.
> + * Copyright(c) 2015 Bryan O'Donoghue <pure.logic@...us-software.ie>
> + *
> + * IMR registers define an isolated region of memory that can
> + * be masked to prohibit certain system agents from accessing memory.
> + * When a device behind a masked port performs an access - snooped or
> + * not, an IMR may optionally prevent that transaction from changing
> + * the state of memory or from getting correct data in response to the
> + * operation.
> + *
> + * Write data will be dropped and reads will return 0xFFFFFFFF, the
> + * system will reset and system BIOS will print out an error message to
> + * inform the user that an IMR has been violated.
> + *
> + * This code is based on the Linux MTRR code and reference code from
> + * Intel's Quark BSP EFI, Linux and grub code.
> + *
> + * See quark-x1000-datasheet.pdf for register definitions.
> + * http://www.intel.com/content/dam/www/public/us/en/documents/datasheets/quark-x1000-datasheet.pdf
> + */
> +
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
> +#include <asm-generic/sections.h>
> +#include <asm/cpu_device_id.h>
> +#include <asm/imr.h>
> +#include <asm/iosf_mbi.h>
> +#include <linux/debugfs.h>
> +#include <linux/init.h>
> +#include <linux/mm.h>
> +#include <linux/module.h>
> +#include <linux/types.h>
> +
> +struct imr_device {
> +       struct dentry   *file;
> +       bool            init;
> +       struct mutex    lock;
> +       int             max_imr;
> +       int             reg_base;
> +};
> +
> +static struct imr_device imr_dev;
> +
> +/*
> + * IMR read/write mask control registers.
> + * See quark-x1000-datasheet.pdf sections 12.7.4.5 and 12.7.4.6 for
> + * bit definitions.
> + *
> + * addr_hi
> + * 31          Lock bit
> + * 30:24       Reserved
> + * 23:2                1 KiB aligned lo address
> + * 1:0         Reserved
> + *
> + * addr_hi
> + * 31:24       Reserved
> + * 23:2                1 KiB aligned hi address
> + * 1:0         Reserved
> + */
> +#define IMR_LOCK       BIT(31)
> +
> +struct imr_regs {
> +       u32 addr_lo;
> +       u32 addr_hi;
> +       u32 rmask;
> +       u32 wmask;
> +};
> +
> +#define IMR_NUM_REGS   (sizeof(struct imr_regs)/sizeof(u32))
> +#define IMR_SHIFT      8
> +#define imr_to_phys(x) ((x) << IMR_SHIFT)
> +#define phys_to_imr(x) ((x) >> IMR_SHIFT)
> +
> +/**
> + * imr_is_enabled - true if an IMR is enabled false otherwise.
> + *
> + * Determines if an IMR is enabled based on address range and read/write
> + * mask. An IMR set with an address range set to zero and a read/write
> + * access mask set to all is considered to be disabled. An IMR in any
> + * other state - for example set to zero but without read/write access
> + * all is considered to be enabled. This definition of disabled is how
> + * firmware switches off an IMR and is maintained in kernel for
> + * consistency.
> + *
> + * @imr:       pointer to IMR descriptor.
> + * @return:    true if IMR enabled false if disabled.
> + */
> +static inline int imr_is_enabled(struct imr_regs *imr)
> +{
> +       return !(imr->rmask == IMR_READ_ACCESS_ALL &&
> +                imr->wmask == IMR_WRITE_ACCESS_ALL &&
> +                imr_to_phys(imr->addr_lo) == 0 &&
> +                imr_to_phys(imr->addr_hi) == 0);
> +}
> +
> +/**
> + * imr_read - read an IMR at a given index.
> + *
> + * Requires caller to hold imr mutex.
> + *
> + * @idev:      pointer to imr_device structure.
> + * @imr_id:    IMR entry to read.
> + * @imr:       IMR structure representing address and access masks.
> + * @return:    0 on success or error code passed from mbi_iosf on failure.
> + */
> +static int imr_read(struct imr_device *idev, u32 imr_id, struct imr_regs *imr)
> +{
> +       u32 reg = imr_id * IMR_NUM_REGS + idev->reg_base;
> +       int ret;
> +
> +       ret = iosf_mbi_read(QRK_MBI_UNIT_MM, QRK_MBI_MM_READ,
> +                               reg++, &imr->addr_lo);
> +       if (ret)
> +               return ret;
> +
> +       ret = iosf_mbi_read(QRK_MBI_UNIT_MM, QRK_MBI_MM_READ,
> +                               reg++, &imr->addr_hi);
> +       if (ret)
> +               return ret;
> +
> +       ret = iosf_mbi_read(QRK_MBI_UNIT_MM, QRK_MBI_MM_READ,
> +                               reg++, &imr->rmask);
> +       if (ret)
> +               return ret;
> +
> +       ret = iosf_mbi_read(QRK_MBI_UNIT_MM, QRK_MBI_MM_READ,
> +                               reg++, &imr->wmask);
> +       if (ret)
> +               return ret;
> +
> +       return 0;
> +}
> +
> +/**
> + * imr_write - write an IMR at a given index.
> + *
> + * Requires caller to hold imr mutex.
> + * Note lock bits need to be written independently of address bits.
> + *
> + * @idev:      pointer to imr_device structure.
> + * @imr_id:    IMR entry to write.
> + * @imr:       IMR structure representing address and access masks.
> + * @lock:      indicates if the IMR lock bit should be applied.
> + * @return:    0 on success or error code passed from mbi_iosf on failure.
> + */
> +static int imr_write(struct imr_device *idev, u32 imr_id,
> +                    struct imr_regs *imr, bool lock)
> +{
> +       unsigned long flags;
> +       u32 reg = imr_id * IMR_NUM_REGS + idev->reg_base;
> +       int ret;
> +
> +       local_irq_save(flags);
> +
> +       ret = iosf_mbi_write(QRK_MBI_UNIT_MM, QRK_MBI_MM_WRITE, reg++,
> +                               imr->addr_lo);
> +       if (ret)
> +               goto failed;
> +
> +       ret = iosf_mbi_write(QRK_MBI_UNIT_MM, QRK_MBI_MM_WRITE,
> +                               reg++, imr->addr_hi);
> +       if (ret)
> +               goto failed;
> +
> +       ret = iosf_mbi_write(QRK_MBI_UNIT_MM, QRK_MBI_MM_WRITE,
> +                               reg++, imr->rmask);
> +       if (ret)
> +               goto failed;
> +
> +       ret = iosf_mbi_write(QRK_MBI_UNIT_MM, QRK_MBI_MM_WRITE,
> +                               reg++, imr->wmask);
> +       if (ret)
> +               goto failed;
> +
> +       /* Lock bit must be set separately to addr_lo address bits. */
> +       if (lock) {
> +               imr->addr_lo |= IMR_LOCK;
> +               ret = iosf_mbi_write(QRK_MBI_UNIT_MM, QRK_MBI_MM_WRITE,
> +                                       reg - IMR_NUM_REGS, imr->addr_lo);
> +               if (ret)
> +                       goto failed;
> +       }
> +
> +       local_irq_restore(flags);
> +       return 0;
> +failed:
> +       /*
> +        * If writing to the IOSF failed then we're in an unknown state,
> +        * likely a very bad state. An IMR in an invalid state will almost
> +        * certainly lead to a memory access violation.
> +        */
> +       local_irq_restore(flags);
> +       WARN(ret, "IOSF-MBI write fail range 0x%08x-0x%08x unreliable\n",
> +            imr_to_phys(imr->addr_lo), imr_to_phys(imr->addr_hi) + IMR_MASK);
> +
> +       return ret;
> +}
> +
> +/**
> + * imr_dbgfs_state_show - print state of IMR registers.
> + *
> + * @s:         pointer to seq_file for output.
> + * @unused:    unused parameter.
> + * @return:    0 on success or error code passed from mbi_iosf on failure.
> + */
> +static int imr_dbgfs_state_show(struct seq_file *s, void *unused)
> +{
> +       phys_addr_t base;
> +       phys_addr_t end;
> +       int i;
> +       struct imr_device *idev = s->private;
> +       struct imr_regs imr;
> +       size_t size;
> +       int ret = -ENODEV;
> +
> +       mutex_lock(&idev->lock);
> +
> +       for (i = 0; i < idev->max_imr; i++) {
> +
> +               ret = imr_read(idev, i, &imr);
> +               if (ret)
> +                       break;
> +
> +               /*
> +                * Remember to add IMR_ALIGN bytes to size to indicate the
> +                * inherent IMR_ALIGN size bytes contained in the masked away
> +                * lower ten bits.
> +                */
> +               if (imr_is_enabled(&imr)) {
> +                       base = imr_to_phys(imr.addr_lo);
> +                       end = imr_to_phys(imr.addr_hi) + IMR_MASK;
> +               } else {
> +                       base = 0;
> +                       end = 0;
> +               }
> +               size = end - base;
> +               seq_printf(s, "imr%02i: base=%pa, end=%pa, size=0x%08zx "
> +                          "rmask=0x%08x, wmask=0x%08x, %s, %s\n", i,
> +                          &base, &end, size, imr.rmask, imr.wmask,
> +                          imr_is_enabled(&imr) ? "enabled " : "disabled",
> +                          imr.addr_lo & IMR_LOCK ? "locked" : "unlocked");
> +       }
> +
> +       mutex_unlock(&idev->lock);
> +

No need to have an empty line here.

> +       return ret;
> +}
> +
> +/**
> + * imr_state_open - debugfs open callback.
> + *
> + * @inode:     pointer to struct inode.
> + * @file:      pointer to struct file.
> + * @return:    result of single open.
> + */
> +static int imr_state_open(struct inode *inode, struct file *file)
> +{
> +       return single_open(file, imr_dbgfs_state_show, inode->i_private);
> +}
> +
> +static const struct file_operations imr_state_ops = {
> +       .open           = imr_state_open,
> +       .read           = seq_read,
> +       .llseek         = seq_lseek,
> +       .release        = single_release,
> +};
> +
> +/**
> + * imr_debugfs_register - register debugfs hooks.
> + *
> + * @idev:      pointer to imr_device structure.
> + * @return:    0 on success - errno on failure.
> + */
> +static int imr_debugfs_register(struct imr_device *idev)
> +{
> +       idev->file = debugfs_create_file("imr_state", S_IFREG | S_IRUGO, NULL,
> +                                        idev, &imr_state_ops);
> +       if (!idev->file)
> +               return -ENOMEM;

When CONFIG_DEBUGFS=n, you will get error pointer here, which is not NULL.

So, the proper check is
if (IS_ERR())
 return PTR_ERR();
if (!file)
 return -ENOMEM;

> +
> +       return 0;
> +}
> +
> +/**
> + * imr_debugfs_unregister - unregister debugfs hooks.
> + *
> + * @idev:      pointer to imr_device structure.
> + * @return:
> + */
> +static void imr_debugfs_unregister(struct imr_device *idev)
> +{
> +       debugfs_remove(idev->file);
> +}
> +
> +/**
> + * imr_check_params - check passed address range IMR alignment and non-zero size
> + *
> + * @base:      base address of intended IMR.
> + * @size:      size of intended IMR.
> + * @return:    zero on valid range -EINVAL on unaligned base/size.
> + */
> +static int imr_check_params(phys_addr_t base, size_t size)
> +{
> +       if ((base & IMR_MASK) || (size & IMR_MASK)) {
> +               pr_warn("base %pa size 0x%08zx must align to 1KiB\n",
> +                       &base, size);

I guess pr_err() since below you return an error to upper level.

> +               return -EINVAL;
> +       }
> +
> +       if (size == 0)
> +               return -ENOMEM;
> +
> +       return 0;
> +}
> +
> +/**
> + * imr_raw_size - account for the IMR_ALIGN bytes that addr_hi appends.
> + *
> + * IMR addr_hi has a built in offset of plus IMR_ALIGN (0x400) bytes from the
> + * value in the register. We need to subtract IMR_ALIGN bytes from input sizes
> + * as a result.
> + *
> + * @size:      input size bytes.
> + * @return:    reduced size.
> + */
> +static inline size_t imr_raw_size(size_t size)
> +{
> +       return size - IMR_ALIGN;
> +}
> +
> +/**
> + * imr_address_overlap - detects an address overlap.
> + *
> + * @addr:      address to check against an existing IMR.
> + * @imr:       imr being checked.
> + * @return:    true for overlap false for no overlap.
> + */
> +static inline int imr_address_overlap(phys_addr_t addr, struct imr_regs *imr)
> +{
> +       return addr >= imr_to_phys(imr->addr_lo) && addr <= imr_to_phys(imr->addr_hi);
> +}
> +
> +/**
> + * imr_add_range - add an Isolated Memory Region.
> + *
> + * @base:      physical base address of region aligned to 1KiB.
> + * @size:      physical size of region in bytes must be aligned to 1KiB.
> + * @read_mask: read access mask.
> + * @write_mask:        write access mask.
> + * @lock:      indicates whether or not to permanently lock this region.
> + * @return:    zero on success or negative value indicating error.
> + */
> +int imr_add_range(phys_addr_t base, size_t size,
> +                 unsigned int rmask, unsigned int wmask, bool lock)
> +{
> +       phys_addr_t end;
> +       unsigned int i;
> +       struct imr_device *idev = &imr_dev;
> +       struct imr_regs imr;
> +       size_t raw_size;
> +       int reg;
> +       int ret;
> +
> +       if (WARN_ONCE(idev->init == false, "driver not initialized"))
> +               return -ENODEV;
> +
> +       ret = imr_check_params(base, size);
> +       if (ret)
> +               return ret;
> +
> +       /* Tweak the size value. */
> +       raw_size = imr_raw_size(size);
> +       end = base + raw_size;
> +
> +       /*
> +        * Check for reserved IMR value common to firmware, kernel and grub
> +        * indicating a disabled IMR.
> +        */
> +       imr.addr_lo = phys_to_imr(base);
> +       imr.addr_hi = phys_to_imr(end);
> +       imr.rmask = rmask;
> +       imr.wmask = wmask;
> +       if (!imr_is_enabled(&imr))
> +               return -EINVAL;

-ENOTSUPP ?

> +
> +       mutex_lock(&idev->lock);
> +
> +       /*
> +        * Find a free IMR while checking for an existing overlapping range.
> +        * Note there's no restriction in silicon to prevent IMR overlaps.
> +        * For the sake of simplicity and ease in defining/debugging an IMR
> +        * memory map we exclude IMR overlaps.
> +        */
> +       reg = -1;
> +       for (i = 0; i < idev->max_imr; i++) {
> +               ret = imr_read(idev, i, &imr);
> +               if (ret)
> +                       goto failed;
> +
> +               /* Find overlap @ base or end of requested range. */
> +               if (imr_is_enabled(&imr)) {



> +                       if (imr_address_overlap(base, &imr)) {
> +                               ret = -EINVAL;
> +                               goto failed;
> +                       }

> +                       if (imr_address_overlap(end, &imr)) {
> +                               ret = -EINVAL;
> +                               goto failed;
> +                       }

ret = -EINVAL;
if ()
 goto failed;
if ()
 goto failed;


> +               } else {
> +                       reg = i;

Do we go always through all IMRs and choose the last one?
If no, break is missed here.

> +               }
> +       }
> +
> +       /* Error out if we have no free IMR entries. */
> +       if (reg == -1) {
> +               ret = -ENODEV;

-ENOMEM ? Like you said there is no *free* IMR.

> +               goto failed;
> +       }
> +
> +       pr_debug("add %d phys %pa-%pa size %zx mask 0x%08x wmask 0x%08x\n",
> +                reg, &base, &end, raw_size, rmask, wmask);
> +
> +       /* Enable IMR at specified range and access mask. */
> +       imr.addr_lo = phys_to_imr(base);
> +       imr.addr_hi = phys_to_imr(end);
> +       imr.rmask = rmask;
> +       imr.wmask = wmask;
> +
> +       ret = imr_write(idev, reg, &imr, lock);
> +       if (ret < 0) {
> +               /*
> +                * In the highly unlikely event iosf_mbi_write failed
> +                * attempt to rollback the IMR setup skipping the trapping
> +                * of further IOSF write failures.
> +                */
> +               imr.addr_lo = 0;
> +               imr.addr_hi = 0;
> +               imr.rmask = IMR_READ_ACCESS_ALL;
> +               imr.wmask = IMR_WRITE_ACCESS_ALL;
> +               imr_write(idev, reg, &imr, false);
> +       }
> +failed:
> +       mutex_unlock(&idev->lock);
> +       return ret;
> +}
> +EXPORT_SYMBOL_GPL(imr_add_range);
> +
> +/**
> + * __imr_remove_range - delete an Isolated Memory Region.
> + *
> + * This function allows you to delete an IMR by its index specified by reg or
> + * by address range specified by base and size respectively. If you specify an
> + * index on its own the base and size parameters are ignored.
> + * imr_remove_range(0, size, base); delete IMR at index 0 base/size ignored.
> + * imr_remove_range(-1, base, size); delete IMR from base to base+size.

(size, base) or (base, size) ?

> + *
> + * @reg:       imr index to remove.
> + * @base:      physical base address of region aligned to 1 KiB.
> + * @size:      physical size of region in bytes aligned to 1 KiB.
> + * @return:    -EINVAL on invalid range or out or range id
> + *             -ENODEV if reg is valid but no IMR exists or is locked
> + *             0 on success.
> + */
> +static int __imr_remove_range(int reg, phys_addr_t base, size_t size)
> +{
> +       phys_addr_t end;
> +       bool found = false;
> +       unsigned int i;
> +       struct imr_device *idev = &imr_dev;
> +       struct imr_regs imr;
> +       size_t raw_size;
> +       int ret = 0;
> +
> +       if (WARN_ONCE(idev->init == false, "driver not initialized"))
> +               return -ENODEV;
> +
> +       ret = imr_check_params(base, size);
> +       if (ret == -EINVAL || (ret == -ENOMEM && reg == -1))

reg base size (0 correct, 1 wrong):

0 0 0 — which should be used? what is the priority?
0 x 1 — index
0 1 x — index
1 0 0 — address
1 0 1 — an error
1 1 1 — an error

Thus, could it be simpler? Like
if (reg < 0 && ret) ?

> +               return ret;
> +
> +       /* Tweak the size value. */
> +       raw_size = imr_raw_size(size);
> +       end = base + raw_size;
> +
> +       mutex_lock(&idev->lock);
> +
> +       if (reg >= 0) {
> +               /* If a specific IMR is given try to use it. */
> +               ret = imr_read(idev, reg, &imr);
> +               if (ret)
> +                       goto failed;
> +
> +               if (!imr_is_enabled(&imr) || imr.addr_lo & IMR_LOCK) {
> +                       ret = -ENODEV;

-ENOTSUPP ?

> +                       goto failed;
> +               }
> +               found = true;
> +       } else {
> +               /* Search for match based on address range. */
> +               for (i = 0; i < idev->max_imr; i++) {
> +                       ret = imr_read(idev, i, &imr);
> +                       if (ret)
> +                               goto failed;
> +
> +                       if (!imr_is_enabled(&imr) || imr.addr_lo & IMR_LOCK)
> +                               continue;
> +
> +                       if ((imr_to_phys(imr.addr_lo) == base) &&
> +                           (imr_to_phys(imr.addr_hi) == end)) {
> +                               found = true;
> +                               reg = i;
> +                               break;
> +                       }
> +               }
> +       }
> +
> +       if (!found) {
> +               ret = -ENODEV;
> +               goto failed;
> +       }
> +
> +       pr_debug("remove %d phys %pa-%pa size %zx\n", reg, &base, &end, raw_size);
> +
> +       /* Tear down the IMR. */
> +       imr.addr_lo = 0;
> +       imr.addr_hi = 0;
> +       imr.rmask = IMR_READ_ACCESS_ALL;
> +       imr.wmask = IMR_WRITE_ACCESS_ALL;
> +
> +       ret = imr_write(idev, reg, &imr, false);
> +
> +failed:
> +       mutex_unlock(&idev->lock);
> +       return ret;
> +}
> +
> +/**
> + * imr_remove_range - delete an Isolated Memory Region by address
> + *
> + * This function allows you to delete an IMR by an address range specified
> + * by base and size respectively.
> + * imr_remove_range(base, size); delete IMR from base to base+size.
> + *
> + * @base:      physical base address of region aligned to 1 KiB.
> + * @size:      physical size of region in bytes aligned to 1 KiB.
> + * @return:    -EINVAL on invalid range or out or range id
> + *             -ENODEV if reg is valid but no IMR exists or is locked
> + *             0 on success.
> + */
> +int imr_remove_range(phys_addr_t base, size_t size)
> +{
> +       return __imr_remove_range(-1, base, size);
> +}
> +EXPORT_SYMBOL_GPL(imr_remove_range);
> +
> +/**
> + * imr_clear - delete an Isolated Memory Region by index
> + *
> + * This function allows you to delete an IMR by an address range specified
> + * by the index of the IMR. Useful for initial sanitization of the IMR
> + * address map.
> + * imr_ge(base, size); delete IMR from base to base+size.
> + *
> + * @reg:       imr index to remove.
> + * @return:    -EINVAL on invalid range or out or range id
> + *             -ENODEV if reg is valid but no IMR exists or is locked
> + *             0 on success.
> + */
> +static inline int imr_clear(int reg)
> +{
> +       return __imr_remove_range(reg, 0, 0);
> +}
> +
> +/**
> + * imr_fixup_memmap - Tear down IMRs used during bootup.
> + *
> + * BIOS and Grub both setup IMRs around compressed kernel, initrd memory
> + * that need to be removed before the kernel hands out one of the IMR
> + * encased addresses to a downstream DMA agent such as the SD or Ethernet.
> + * IMRs on Galileo are setup to immediately reset the system on violation.
> + * As a result if you're running a root filesystem from SD - you'll need
> + * the boot-time IMRs torn down or you'll find seemingly random resets when
> + * using your filesystem.
> + *
> + * @idev:      pointer to imr_device structure.
> + * @return:
> + */
> +static void __init imr_fixup_memmap(struct imr_device *idev)
> +{
> +       phys_addr_t base = virt_to_phys(&_text);
> +       size_t size = virt_to_phys(&__end_rodata) - base;
> +       int i;
> +       int ret;
> +
> +       /* Tear down all existing unlocked IMRs. */
> +       for (i = 0; i < idev->max_imr; i++)
> +               imr_clear(i);
> +
> +       /*
> +        * Setup a locked IMR around the physical extent of the kernel
> +        * from the beginning of the .text secton to the end of the
> +        * .rodata section as one physically contiguous block.
> +        */
> +       ret = imr_add_range(base, size, IMR_CPU, IMR_CPU, true);
> +       if (ret < 0) {
> +               pr_err("unable to setup IMR for kernel: (%p - %p)\n",
> +                       &_text, &__end_rodata);
> +       } else {
> +               pr_info("protecting kernel .text - .rodata: %zu KiB (%p - %p)\n",
> +                       size / 1024, &_text, &__end_rodata);
> +       }
> +
> +}
> +
> +static const struct x86_cpu_id imr_ids[] __initconst = {
> +       { X86_VENDOR_INTEL, 5, 9 },     /* Intel Quark SoC X1000. */
> +       {}
> +};
> +MODULE_DEVICE_TABLE(x86cpu, imr_ids);
> +
> +/**
> + * imr_init - entry point for IMR driver.
> + *
> + * return: -ENODEV for no IMR support 0 if good to go.
> + */
> +static int __init imr_init(void)
> +{
> +       struct imr_device *idev = &imr_dev;
> +       int ret;
> +
> +       if (!x86_match_cpu(imr_ids) || !iosf_mbi_available())
> +               return -ENODEV;
> +
> +       idev->max_imr = QUARK_X1000_IMR_MAX;
> +       idev->reg_base = QUARK_X1000_IMR_REGBASE;
> +       idev->init = true;
> +
> +       mutex_init(&idev->lock);
> +       ret = imr_debugfs_register(idev);
> +       if (ret != 0)

if (ret)

> +               pr_warn("debugfs register failed!\n");

Do we actually need this? Or move it to debug level.

> +       imr_fixup_memmap(idev);
> +       return 0;
> +}
> +
> +/**
> + * imr_exit - exit point for IMR code.
> + *
> + * Deregisters debugfs, leave IMR state as-is.
> + *
> + * return:
> + */
> +static void __exit imr_exit(void)
> +{
> +       imr_debugfs_unregister(&imr_dev);
> +}
> +
> +module_init(imr_init);
> +module_exit(imr_exit);
> +
> +MODULE_AUTHOR("Bryan O'Donoghue <pure.logic@...us-software.ie>");
> +MODULE_DESCRIPTION("Intel Isolated Memory Region driver");
> +MODULE_LICENSE("Dual BSD/GPL");
> diff --git a/arch/x86/platform/intel-quark/imr_selftest.c b/arch/x86/platform/intel-quark/imr_selftest.c
> new file mode 100644
> index 0000000..9af7f81
> --- /dev/null
> +++ b/arch/x86/platform/intel-quark/imr_selftest.c
> @@ -0,0 +1,124 @@
> +/**
> + * imr_selftest.c
> + *
> + * Copyright(c) 2013 Intel Corporation.
> + * Copyright(c) 2015 Bryan O'Donoghue <pure.logic@...us-software.ie>
> + *
> + * IMR self test. The purpose of this module is to run a set of tests on the
> + * IMR API to validate it's sanity. We check for overlapping, reserved
> + * addresses and setup/teardown sanity.
> + *
> + */
> +
> +#include <asm-generic/sections.h>
> +#include <asm/imr.h>
> +#include <linux/init.h>
> +#include <linux/mm.h>
> +#include <linux/module.h>
> +#include <linux/types.h>
> +
> +#define SELFTEST KBUILD_MODNAME ": self_test "
> +/**
> + * imr_self_test_result - Print result string for self test.
> + *
> + * @res:       result code - true if test passed false otherwise.
> + * @fmt:       format string.
> + * ...         variadic argument list.
> + */
> +static void __init imr_self_test_result(int res, const char *fmt, ...)
> +{
> +       va_list vlist;
> +
> +       va_start(vlist, fmt);
> +       if (res)
> +               printk(KERN_INFO SELFTEST "pass ");
> +       else
> +               printk(KERN_ERR SELFTEST "fail ");
> +       vprintk(fmt, vlist);

Here is the mix of kernel levels. What about to align them?

For example I doubt we need to distinguish messages by level:

pr_info();
vprintk(KERN_INFO fmt, …);

> +       va_end(vlist);
> +       WARN(res == 0, "test failed");
> +}
> +#undef SELFTEST
> +
> +/**
> + * imr_self_test
> + *
> + * Verify IMR self_test with some simple tests to verify overlap,
> + * zero sized allocations and 1 KiB sized areas.
> + *
> + */
> +static void __init imr_self_test(void)
> +{
> +       phys_addr_t base  = virt_to_phys(&_text);
> +       size_t size = virt_to_phys(&__end_rodata) - base;
> +       const char *fmt_over = "overlapped IMR @ (0x%08lx - 0x%08lx)\n";
> +       int ret;
> +
> +       /* Test zero zero. */
> +       ret = imr_add_range(0, 0, 0, 0, false);
> +       imr_self_test_result(ret < 0, "zero sized IMR\n");
> +
> +       /* Test exact overlap. */
> +       ret = imr_add_range(base, size, IMR_CPU, IMR_CPU, false);
> +       imr_self_test_result(ret < 0, fmt_over, __va(base), __va(base + size));
> +
> +       /* Test overlap with base inside of existing. */
> +       base += size - IMR_ALIGN;
> +       ret = imr_add_range(base, size, IMR_CPU, IMR_CPU, false);
> +       imr_self_test_result(ret < 0, fmt_over, __va(base), __va(base + size));
> +
> +       /* Test overlap with end inside of existing. */
> +       base -= size + IMR_ALIGN * 2;
> +       ret = imr_add_range(base, size, IMR_CPU, IMR_CPU, false);
> +       imr_self_test_result(ret < 0, fmt_over, __va(base), __va(base + size));
> +
> +       /* Test that a 1 KiB IMR @ zero with read/write all will bomb out. */
> +       ret = imr_add_range(0, IMR_ALIGN, IMR_READ_ACCESS_ALL,
> +                           IMR_WRITE_ACCESS_ALL, false);
> +       imr_self_test_result(ret < 0, "1KiB IMR @ 0x00000000 - access-all\n");
> +
> +       /* Test that a 1 KiB IMR @ zero with CPU only will work. */
> +       ret = imr_add_range(0, IMR_ALIGN, IMR_CPU, IMR_CPU, false);
> +       imr_self_test_result(ret >= 0, "1KiB IMR @ 0x00000000 - cpu-access\n");
> +       if (ret >= 0) {
> +               ret = imr_remove_range(0, IMR_ALIGN);
> +               imr_self_test_result(ret == 0, "teardown - cpu-access\n");
> +       }
> +
> +       /* Test 2 KiB works. */
> +       size = IMR_ALIGN * 2;
> +       ret = imr_add_range(0, size, IMR_READ_ACCESS_ALL,
> +                           IMR_WRITE_ACCESS_ALL, false);
> +       imr_self_test_result(ret >= 0, "2KiB IMR @ 0x00000000\n");
> +       if (ret >= 0) {
> +               ret = imr_remove_range(0, size);
> +               imr_self_test_result(ret == 0, "teardown 2KiB\n");
> +       }
> +}
> +
> +/**
> + * imr_self_test_init - entry point for IMR driver.
> + *
> + * return: -ENODEV for no IMR support 0 if good to go.
> + */
> +static int __init imr_self_test_init(void)
> +{
> +       imr_self_test();
> +       return 0;
> +}
> +
> +/**
> + * imr_self_test_exit - exit point for IMR code.
> + *
> + * return:
> + */
> +static void __exit imr_self_test_exit(void)
> +{
> +}
> +
> +module_init(imr_self_test_init);
> +module_exit(imr_self_test_exit);
> +
> +MODULE_AUTHOR("Bryan O'Donoghue <pure.logic@...us-software.ie>");
> +MODULE_DESCRIPTION("Intel Isolated Memory Region self-test driver");
> +MODULE_LICENSE("Dual BSD/GPL");
> diff --git a/drivers/platform/x86/Kconfig b/drivers/platform/x86/Kconfig
> index 638e7970..9752761 100644
> --- a/drivers/platform/x86/Kconfig
> +++ b/drivers/platform/x86/Kconfig
> @@ -735,6 +735,31 @@ config INTEL_IPS
>           functionality.  If in doubt, say Y here; it will only load on
>           supported platforms.
>
> +config INTEL_IMR
> +       bool "Intel Isolated Memory Region support"
> +       default n
> +       depends on X86_INTEL_QUARK && IOSF_MBI
> +       ---help---
> +         This option provides a means to manipulate Isolated Memory Regions.
> +         IMRs are a set of registers that define read and write access masks
> +         to prohibit certain system agents from accessing memory with 1 KiB
> +         granularity.
> +
> +         IMRs make it possible to control read/write access to an address
> +         by hardware agents inside the SoC. Read and write masks can be
> +         defined for:
> +               - eSRAM flush
> +               - Dirty CPU snoop (write only)
> +               - RMU access
> +               - PCI Virtual Channel 0/Virtual Channel 1
> +               - SMM mode
> +               - Non SMM mode
> +
> +         Quark contains a set of eight IMR registers and makes use of those
> +         registers during its bootup process.
> +
> +         If you are running on a Galileo/Quark say Y here.
> +
>  config IBM_RTL
>         tristate "Device driver to enable PRTL support"
>         depends on X86 && PCI
> --
> 1.9.1
>



--
With Best Regards,
Andy Shevchenko


-- 
With Best Regards,
Andy Shevchenko
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/