linux-kernel - Re: [PATCH 6/7] RISC-V: arch/riscv/kernel

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOesGMjM097wqwf1jrq6TetF_8G_H9Dd0sjs+HeBUuyVxWP34w@mail.gmail.com>
Date:   Mon, 22 May 2017 19:11:35 -0700
From:   Olof Johansson <olof@...om.net>
To:     Palmer Dabbelt <palmer@...belt.com>
Cc:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Arnd Bergmann <arnd@...db.de>, albert@...ive.com
Subject: Re: [PATCH 6/7] RISC-V: arch/riscv/kernel

On Mon, May 22, 2017 at 5:41 PM, Palmer Dabbelt <palmer@...belt.com> wrote:
> ---
>  arch/riscv/kernel/Makefile         |  19 ++
>  arch/riscv/kernel/asm-offsets.c    | 113 ++++++++++
>  arch/riscv/kernel/cacheinfo.c      |  82 ++++++++
>  arch/riscv/kernel/cpu.c            |  81 ++++++++
>  arch/riscv/kernel/entry.S          | 414 +++++++++++++++++++++++++++++++++++++
>  arch/riscv/kernel/head.S           | 139 +++++++++++++
>  arch/riscv/kernel/irq.c            | 205 ++++++++++++++++++
>  arch/riscv/kernel/module.c         | 185 +++++++++++++++++
>  arch/riscv/kernel/pci.c            |  36 ++++
>  arch/riscv/kernel/plic.c           | 208 +++++++++++++++++++
>  arch/riscv/kernel/process.c        | 130 ++++++++++++
>  arch/riscv/kernel/ptrace.c         | 148 +++++++++++++
>  arch/riscv/kernel/reset.c          |  33 +++
>  arch/riscv/kernel/riscv_ksyms.c    |  16 ++
>  arch/riscv/kernel/sbi-con.c        | 214 +++++++++++++++++++
>  arch/riscv/kernel/setup.c          | 234 +++++++++++++++++++++
>  arch/riscv/kernel/signal.c         | 258 +++++++++++++++++++++++
>  arch/riscv/kernel/smp.c            | 107 ++++++++++
>  arch/riscv/kernel/smpboot.c        | 105 ++++++++++
>  arch/riscv/kernel/stacktrace.c     | 183 ++++++++++++++++
>  arch/riscv/kernel/sys_riscv.c      |  85 ++++++++
>  arch/riscv/kernel/syscall_table.c  |  26 +++
>  arch/riscv/kernel/time.c           | 116 +++++++++++
>  arch/riscv/kernel/traps.c          | 167 +++++++++++++++
>  arch/riscv/kernel/vdso.c           | 125 +++++++++++
>  arch/riscv/kernel/vdso/.gitignore  |   1 +
>  arch/riscv/kernel/vdso/Makefile    |  61 ++++++
>  arch/riscv/kernel/vdso/sigreturn.S |  25 +++
>  arch/riscv/kernel/vdso/vdso.S      |  28 +++
>  arch/riscv/kernel/vdso/vdso.lds.S  |  77 +++++++
>  arch/riscv/kernel/vmlinux.lds.S    |  93 +++++++++
>  31 files changed, 3714 insertions(+)
>  create mode 100644 arch/riscv/kernel/Makefile
>  create mode 100644 arch/riscv/kernel/asm-offsets.c
>  create mode 100644 arch/riscv/kernel/cacheinfo.c
>  create mode 100644 arch/riscv/kernel/cpu.c
>  create mode 100644 arch/riscv/kernel/entry.S
>  create mode 100644 arch/riscv/kernel/head.S
>  create mode 100644 arch/riscv/kernel/irq.c
>  create mode 100644 arch/riscv/kernel/module.c
>  create mode 100644 arch/riscv/kernel/pci.c
>  create mode 100644 arch/riscv/kernel/plic.c
>  create mode 100644 arch/riscv/kernel/process.c
>  create mode 100644 arch/riscv/kernel/ptrace.c
>  create mode 100644 arch/riscv/kernel/reset.c
>  create mode 100644 arch/riscv/kernel/riscv_ksyms.c
>  create mode 100644 arch/riscv/kernel/sbi-con.c
>  create mode 100644 arch/riscv/kernel/setup.c
>  create mode 100644 arch/riscv/kernel/signal.c
>  create mode 100644 arch/riscv/kernel/smp.c
>  create mode 100644 arch/riscv/kernel/smpboot.c
>  create mode 100644 arch/riscv/kernel/stacktrace.c
>  create mode 100644 arch/riscv/kernel/sys_riscv.c
>  create mode 100644 arch/riscv/kernel/syscall_table.c
>  create mode 100644 arch/riscv/kernel/time.c
>  create mode 100644 arch/riscv/kernel/traps.c
>  create mode 100644 arch/riscv/kernel/vdso.c
>  create mode 100644 arch/riscv/kernel/vdso/.gitignore
>  create mode 100644 arch/riscv/kernel/vdso/Makefile
>  create mode 100644 arch/riscv/kernel/vdso/sigreturn.S
>  create mode 100644 arch/riscv/kernel/vdso/vdso.S
>  create mode 100644 arch/riscv/kernel/vdso/vdso.lds.S
>  create mode 100644 arch/riscv/kernel/vmlinux.lds.S

What's missing from this patchset (ideally) is a good writeup under
DOcumentation/ on expectations of system state (and/or configuration)
upon entry of the kernel. For comparison, see the arm64 documentation
where they were quite specific in this.


This patch is also pushing size limits, and is getting unwieldy to
comment on. I'll point out a few things below with plenty of snipped
out lines.

>
> diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> new file mode 100644
> index 000000000000..94ac2931c56a
> --- /dev/null
> +++ b/arch/riscv/kernel/Makefile
> @@ -0,0 +1,19 @@
> +#
> +# Makefile for the RISC-V Linux kernel
> +#
> +
> +extra-y := head.o vmlinux.lds
> +
> +obj-y  := cpu.o entry.o irq.o process.o ptrace.o reset.o setup.o \
> +          signal.o syscall_table.o sys_riscv.o time.o traps.o \
> +          riscv_ksyms.o stacktrace.o vdso.o cacheinfo.o vdso/
> +
> +CFLAGS_setup.o := -mcmodel=medany
> +
> +obj-$(CONFIG_SMP)              += smpboot.o smp.o
> +obj-$(CONFIG_SBI_CONSOLE)      += sbi-con.o
> +obj-$(CONFIG_PCI)              += pci.o
> +obj-$(CONFIG_MODULES)          += module.o
> +obj-$(CONFIG_PLIC)             += plic.o
> +
> +clean:
> diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
> new file mode 100644
> index 000000000000..ac2e0cfaf8a3
> --- /dev/null
> +++ b/arch/riscv/kernel/asm-offsets.c
> @@ -0,0 +1,113 @@
> +/*
> + * Copyright (C) 2012 Regents of the University of California
> + *
> + *   This program is free software; you can redistribute it and/or
> + *   modify it under the terms of the GNU General Public License
> + *   as published by the Free Software Foundation, version 2.
> + *
> + *   This program is distributed in the hope that it will be useful, but
> + *   WITHOUT ANY WARRANTY; without even the implied warranty of
> + *   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
> + *   NON INFRINGEMENT.  See the GNU General Public License for
> + *   more details.

Hmm, I haven't seen these terms used often, but they seem to exist
around the tree in a few places. arch/tile is littered with them.

I am not a lawyer, but I can't seem any reference to "good title" in
the GPLv2 text.

Rather than having to go through the process of figuring out if this
license header is acceptable or not, you might find it easier to just
go with something more established.

> diff --git a/arch/riscv/kernel/cacheinfo.c b/arch/riscv/kernel/cacheinfo.c
> new file mode 100644
> index 000000000000..a22ea8abbf3c
> --- /dev/null
> +++ b/arch/riscv/kernel/cacheinfo.c
> @@ -0,0 +1,82 @@
> +/*
> + * Copyright (C) 2017 SiFive
> + *
> + *   This program is free software; you can redistribute it and/or
> + *   modify it under the terms of the GNU General Public License
> + *   as published by the Free Software Foundation, version 2.
> + *
> + *   This program is distributed in the hope that it will be useful, but
> + *   WITHOUT ANY WARRANTY; without even the implied warranty of
> + *   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
> + *   NON INFRINGEMENT.  See the GNU General Public License for
> + *   more details.
> + */
> +
> +#include <linux/cacheinfo.h>
> +#include <linux/cpu.h>
> +#include <linux/of.h>
> +#include <linux/of_device.h>
> +
> +static void ci_leaf_init(struct cacheinfo *this_leaf,
> +                         struct device_node *node,
> +                         enum cache_type type, unsigned int level)
> +{
> +        this_leaf->of_node = node;
> +        this_leaf->level = level;
> +        this_leaf->type = type;
> +        this_leaf->physical_line_partition = 1; // not a sector cache
> +        this_leaf->attributes = CACHE_WRITE_BACK | CACHE_READ_ALLOCATE | CACHE_WRITE_ALLOCATE; // TODO: add to DTS
> +}
> +
> +static int __init_cache_level(unsigned int cpu)
> +{
> +       struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
> +       struct device_node *np = of_cpu_device_node_get(cpu);
> +       int levels = 0, leaves = 0, level;
> +
> +       if (of_property_read_bool(np, "cache-size")) ++leaves;
> +       if (of_property_read_bool(np, "i-cache-size")) ++leaves;
> +       if (of_property_read_bool(np, "d-cache-size")) ++leaves;
> +       if (leaves > 0) levels = 1;
> +
> +       while ((np = of_find_next_cache_node(np))) {
> +               if (!of_device_is_compatible(np, "cache")) break;
> +               if (of_property_read_u32(np, "cache-level", &level)) break;
> +               if (level <= levels) break;
> +               if (of_property_read_bool(np, "cache-size")) ++leaves;
> +               if (of_property_read_bool(np, "i-cache-size")) ++leaves;
> +               if (of_property_read_bool(np, "d-cache-size")) ++leaves;
> +               levels = level;
> +       }
> +
> +       this_cpu_ci->num_levels = levels;
> +       this_cpu_ci->num_leaves = leaves;
> +       return 0;
> +}
> +
> +static int __populate_cache_leaves(unsigned int cpu)
> +{
> +       struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
> +       struct cacheinfo *this_leaf = this_cpu_ci->info_list;
> +       struct device_node *np = of_cpu_device_node_get(cpu);
> +       int levels = 1, level = 1;
> +
> +       if (of_property_read_bool(np, "cache-size"))   ci_leaf_init(this_leaf++, np, CACHE_TYPE_UNIFIED, level);
> +       if (of_property_read_bool(np, "i-cache-size")) ci_leaf_init(this_leaf++, np, CACHE_TYPE_INST, level);
> +       if (of_property_read_bool(np, "d-cache-size")) ci_leaf_init(this_leaf++, np, CACHE_TYPE_DATA, level);

Please run checkpatch, kernel coding style doesn't use one-line ifs
(here nor elsewhere).

> +
> +       while ((np = of_find_next_cache_node(np))) {
> +               if (!of_device_is_compatible(np, "cache")) break;
> +               if (of_property_read_u32(np, "cache-level", &level)) break;
> +               if (level <= levels) break;
> +               if (of_property_read_bool(np, "cache-size"))   ci_leaf_init(this_leaf++, np, CACHE_TYPE_UNIFIED, level);
> +               if (of_property_read_bool(np, "i-cache-size")) ci_leaf_init(this_leaf++, np, CACHE_TYPE_INST, level);
> +               if (of_property_read_bool(np, "d-cache-size")) ci_leaf_init(this_leaf++, np, CACHE_TYPE_DATA, level);
> +               levels = level;
> +       }
> +
> +       return 0;
> +}
> +
> +DEFINE_SMP_CALL_CACHE_FUNCTION(init_cache_level)
> +DEFINE_SMP_CALL_CACHE_FUNCTION(populate_cache_leaves)
> diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
> new file mode 100644
> index 000000000000..9cbf53eb58be
> --- /dev/null
> +++ b/arch/riscv/kernel/cpu.c
> @@ -0,0 +1,81 @@
> +/*
> + * Copyright (C) 2012 Regents of the University of California
> + *
> + *   This program is free software; you can redistribute it and/or
> + *   modify it under the terms of the GNU General Public License
> + *   as published by the Free Software Foundation, version 2.
> + *
> + *   This program is distributed in the hope that it will be useful, but
> + *   WITHOUT ANY WARRANTY; without even the implied warranty of
> + *   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
> + *   NON INFRINGEMENT.  See the GNU General Public License for
> + *   more details.
> + */
> +
> +#include <linux/init.h>
> +#include <linux/seq_file.h>
> +#include <linux/of.h>
> +
> +/* Return -1 if not a valid hart */
> +int riscv_of_processor_hart(struct device_node *node)
> +{
> +       const char *isa, *status;
> +       u32 hart;
> +
> +       if (!of_device_is_compatible(node, "riscv")) return -1;
> +       if (of_property_read_u32(node, "reg", &hart) || hart >= NR_CPUS) return -1;
> +       if (of_property_read_string(node, "status", &status) || strcmp(status, "okay")) return -1;
> +       if (of_property_read_string(node, "riscv,isa", &isa) || isa[0] != 'r' || isa[1] != 'v') return -1;
> +
> +       return hart;
> +}

We usually prefer to see real -E<foo> returns instead of -1 in the kernel.

[...]

> diff --git a/arch/riscv/kernel/head.S b/arch/riscv/kernel/head.S
> new file mode 100644
> index 000000000000..52d574206d76
> --- /dev/null
> +++ b/arch/riscv/kernel/head.S
> @@ -0,0 +1,139 @@
> +/*
> + * Copyright (C) 2012 Regents of the University of California
> + *
> + *   This program is free software; you can redistribute it and/or
> + *   modify it under the terms of the GNU General Public License
> + *   as published by the Free Software Foundation, version 2.
> + *
> + *   This program is distributed in the hope that it will be useful, but
> + *   WITHOUT ANY WARRANTY; without even the implied warranty of
> + *   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
> + *   NON INFRINGEMENT.  See the GNU General Public License for
> + *   more details.
> + */
> +
> +#include <asm/thread_info.h>
> +#include <asm/asm-offsets.h>
> +#include <asm/asm.h>
> +#include <linux/init.h>
> +#include <linux/linkage.h>
> +#include <asm/thread_info.h>
> +#include <asm/page.h>
> +#include <asm/csr.h>
> +
> +__INIT
> +ENTRY(_start)
> +       /* Mask all interrupts */
> +       csrw sie, zero
> +
> +       /* Disable FPU to detect illegal usage of
> +          floating point in kernel space */
> +       li t0, SR_FS
> +       csrc sstatus, t0
> +
> +#ifndef CONFIG_RV_PUM
> +       /* Allow access to user memory */
> +       li t0, SR_SUM
> +       csrs sstatus, t0
> +#endif
> +
> +       /* Pick one hart to run the main boot sequence */
> +       la a3, hart_lottery
> +       li a2, 1
> +       amoadd.w a3, a2, (a3)
> +       bnez a3, .Lsecondary_start
> +
> +       /* Save hart ID and DTB physical address */
> +       mv s0, a0
> +       mv s1, a1
> +
> +       /* Initialize page tables and relocate to virtual addresses */
> +       la sp, init_thread_union + THREAD_SIZE
> +       call setup_vm
> +       call relocate
> +
> +       /* Restore C environment */
> +       la tp, init_thread_union
> +       li sp, THREAD_SIZE
> +       add sp, sp, tp
> +
> +       /* Start the kernel */
> +       mv a0, s0
> +       mv a1, s1
> +       call sbi_save
> +       tail start_kernel
> +
> +relocate:
> +       /* Relocate return address */
> +       li a1, PAGE_OFFSET
> +       la a0, _start
> +       sub a1, a1, a0
> +       add ra, ra, a1
> +
> +       /* Point stvec to virtual address of intruction after sptbr write */
> +       la a0, 1f
> +       add a0, a0, a1
> +       csrw stvec, a0
> +
> +       /* Compute sptbr for kernel page tables, but don't load it yet */
> +       la a2, swapper_pg_dir
> +       srl a2, a2, PAGE_SHIFT
> +       li a1, SPTBR_MODE
> +       or a2, a2, a1
> +
> +       /* Load trampoline page directory, which will cause us to trap to
> +          stvec if VA != PA, or simply fall through if VA == PA */
> +       la a0, trampoline_pg_dir
> +       srl a0, a0, PAGE_SHIFT
> +       or a0, a0, a1
> +       sfence.vma
> +       csrw sptbr, a0
> +1:
> +       /* Set trap vector to spin forever to help debug */
> +       la a0, .Lsecondary_park
> +       csrw stvec, a0
> +
> +       /* Load the global pointer */
> +       la gp, __global_pointer$
> +
> +       /* Switch to kernel page tables */
> +       csrw sptbr, a2
> +
> +       ret
> +
> +.Lsecondary_start:
> +#ifdef CONFIG_SMP
> +       li a1, CONFIG_NR_CPUS
> +       bgeu a0, a1, .Lsecondary_park
> +
> +       la a1, __cpu_up_stack_pointer
> +       slli a0, a0, LGREG
> +       add a0, a0, a1
> +
> +.Lwait_for_cpu_up:
> +       REG_L sp, (a0)
> +       beqz sp, .Lwait_for_cpu_up
> +       fence
> +
> +       /* Enable virtual memory and relocate to virtual address */
> +       call relocate
> +
> +       /* Initialize task_struct pointer */
> +       li tp, -THREAD_SIZE
> +       add tp, tp, sp
> +
> +       tail smp_callin
> +#endif
> +
> +.Lsecondary_park:
> +       /* We lack SMP support or have too many harts, so park this hart */
> +       wfi
> +       j .Lsecondary_park
> +END(_start)
> +
> +__PAGE_ALIGNED_BSS
> +       /* Empty zero page */
> +       .balign PAGE_SIZE
> +ENTRY(empty_zero_page)
> +       .fill (empty_zero_page + PAGE_SIZE) - ., 1, 0x00
> +END(empty_zero_page)
> diff --git a/arch/riscv/kernel/irq.c b/arch/riscv/kernel/irq.c
> new file mode 100644
> index 000000000000..b772bb9539cf
> --- /dev/null
> +++ b/arch/riscv/kernel/irq.c
> @@ -0,0 +1,205 @@
> +/*
> + * Copyright (C) 2012 Regents of the University of California
> + * Copyright (C) 2017 SiFive
> + *
> + *   This program is free software; you can redistribute it and/or
> + *   modify it under the terms of the GNU General Public License
> + *   as published by the Free Software Foundation, version 2.
> + *
> + *   This program is distributed in the hope that it will be useful, but
> + *   WITHOUT ANY WARRANTY; without even the implied warranty of
> + *   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
> + *   NON INFRINGEMENT.  See the GNU General Public License for
> + *   more details.
> + */
> +
> +#include <linux/irq.h>
> +#include <linux/irqchip.h>
> +#include <linux/irqdomain.h>
> +#include <linux/interrupt.h>
> +#include <linux/ftrace.h>
> +#include <linux/of.h>
> +#include <linux/seq_file.h>
> +
> +#include <asm/ptrace.h>
> +#include <asm/sbi.h>
> +#include <asm/smp.h>
> +
> +struct riscv_irq_data {
> +       struct irq_chip         chip;
> +       struct irq_domain       *domain;
> +       int                     hart;
> +       char                    name[20];
> +};
> +DEFINE_PER_CPU(struct riscv_irq_data, riscv_irq_data);
> +DEFINE_PER_CPU(atomic_long_t, riscv_early_sie);
> +
> +static void riscv_software_interrupt(void)
> +{
> +#ifdef CONFIG_SMP
> +       irqreturn_t ret;
> +
> +       ret = handle_ipi();
> +       if (ret != IRQ_NONE)
> +               return;
> +#endif
> +
> +       BUG();
> +}
> +
> +asmlinkage void __irq_entry do_IRQ(unsigned int cause, struct pt_regs *regs)
> +{
> +       struct pt_regs *old_regs = set_irq_regs(regs);
> +       irq_enter();
> +
> +       /* There are three classes of interrupt: timer, software, and
> +          external devices.  We dispatch between them here.  External
> +          device interrupts use the generic IRQ mechanisms. */
> +       switch (cause) {
> +               case INTERRUPT_CAUSE_TIMER:
> +                       riscv_timer_interrupt();
> +                       break;
> +               case INTERRUPT_CAUSE_SOFTWARE:
> +                       riscv_software_interrupt();
> +                       break;
> +               default: {
> +                       struct irq_domain *domain = per_cpu(riscv_irq_data, smp_processor_id()).domain;

Move this up to top of function and remove the { } wrap, please.

> +                       generic_handle_irq(irq_find_mapping(domain, cause));
> +                       break;
> +               }
> +       }
> +
> +       irq_exit();
> +       set_irq_regs(old_regs);
> +}
> +
> +static int riscv_irqdomain_map(struct irq_domain *d, unsigned int irq, irq_hw_number_t hwirq)
> +{
> +       struct riscv_irq_data *data = d->host_data;
> +
> +        irq_set_chip_and_handler(irq, &data->chip, handle_simple_irq);
> +        irq_set_chip_data(irq, data);
> +        irq_set_noprobe(irq);
> +
> +        return 0;
> +}
> +
> +static const struct irq_domain_ops riscv_irqdomain_ops = {
> +       .map    = riscv_irqdomain_map,
> +       .xlate  = irq_domain_xlate_onecell,
> +};
> +
> +static void riscv_irq_mask(struct irq_data *d)
> +{
> +       struct riscv_irq_data *data = irq_data_get_irq_chip_data(d);
> +       BUG_ON(smp_processor_id() != data->hart);
> +       csr_clear(sie, 1 << (long)d->hwirq);
> +}
> +
> +static void riscv_irq_unmask(struct irq_data *d)
> +{
> +       struct riscv_irq_data *data = irq_data_get_irq_chip_data(d);
> +       BUG_ON(smp_processor_id() != data->hart);
> +       csr_set(sie, 1 << (long)d->hwirq);
> +}
> +
> +static void riscv_irq_enable_helper(void *d)
> +{
> +       riscv_irq_unmask(d);
> +}
> +
> +static void riscv_irq_enable(struct irq_data *d)
> +{
> +       struct riscv_irq_data *data = irq_data_get_irq_chip_data(d);
> +       atomic_long_or((1 << (long)d->hwirq), &per_cpu(riscv_early_sie, data->hart));

This is a bit dense to get into without a few words of how it's
expected to work.

> +       if (data->hart == smp_processor_id()) {
> +               riscv_irq_unmask(d);
> +       } else if (cpu_online(data->hart)) {
> +               smp_call_function_single(data->hart, riscv_irq_enable_helper, d, true);
> +       }
> +}
> +
> +static void riscv_irq_disable_helper(void *d)
> +{
> +       riscv_irq_mask(d);
> +}
> +
> +static void riscv_irq_disable(struct irq_data *d)
> +{
> +       struct riscv_irq_data *data = irq_data_get_irq_chip_data(d);
> +       atomic_long_and(~(1 << (long)d->hwirq), &per_cpu(riscv_early_sie, data->hart));
> +       if (data->hart == smp_processor_id()) {
> +               riscv_irq_mask(d);
> +       } else if (cpu_online(data->hart)) {
> +               smp_call_function_single(data->hart, riscv_irq_disable_helper, d, true);
> +       }
> +}
> +
> +static void riscv_irq_mask_noop(struct irq_data *d) { }
> +
> +static void riscv_irq_unmask_noop(struct irq_data *d) { }
> +
> +static void riscv_irq_enable_noop(struct irq_data *d)
> +{
> +       struct device_node *data = irq_data_get_irq_chip_data(d);
> +       u32 hart;
> +
> +       if (!of_property_read_u32(data, "reg", &hart)) {
> +               printk("WARNING: enabled interrupt %d for missing hart %d (this interrupt has no handler)\n", (int)d->hwirq, hart);
> +       }
> +}
> +
> +static struct irq_chip riscv_noop_chip = {
> +       .name = "riscv,cpu-intc,noop",
> +       .irq_mask = riscv_irq_mask_noop,
> +       .irq_unmask = riscv_irq_unmask_noop,
> +       .irq_enable = riscv_irq_enable_noop,
> +};
> +
> +static int riscv_irqdomain_map_noop(struct irq_domain *d, unsigned int irq, irq_hw_number_t hwirq)
> +{
> +       struct device_node *data = d->host_data;
> +       irq_set_chip_and_handler(irq, &riscv_noop_chip, handle_simple_irq);
> +       irq_set_chip_data(irq, data);
> +       return 0;
> +}
> +
> +static const struct irq_domain_ops riscv_irqdomain_ops_noop = {
> +       .map    = riscv_irqdomain_map_noop,
> +       .xlate  = irq_domain_xlate_onecell,
> +};
> +
> +static int riscv_intc_init(struct device_node *node, struct device_node *parent)
> +{
> +       int hart;
> +
> +       if (parent) return 0; // should have no interrupt parent
> +
> +       if ((hart = riscv_of_processor_hart(node->parent)) >= 0) {

Common pattern in kernel is to detect error instead:
    hart = riscv_of_processor_hart(node->parent);
    if (hart < 0) {
        <from your else side here>
        return 0;
    }
    <body if your if statement here>

    return 0;

> +               struct riscv_irq_data *data = &per_cpu(riscv_irq_data, hart);
> +               snprintf(data->name, sizeof(data->name), "riscv,cpu_intc,%d", hart);
> +               data->hart = hart;
> +               data->chip.name = data->name;
> +               data->chip.irq_mask = riscv_irq_mask;
> +               data->chip.irq_unmask = riscv_irq_unmask;
> +               data->chip.irq_enable = riscv_irq_enable;
> +               data->chip.irq_disable = riscv_irq_disable;
> +               data->domain = irq_domain_add_linear(node, 8*sizeof(uintptr_t), &riscv_irqdomain_ops, data);
> +               WARN_ON(!data->domain);
> +               printk("%s: %d local interrupts mapped\n", data->name, 8*(int)sizeof(uintptr_t));
> +       } else {
> +               /* If a hart is disabled, create a no-op irq domain.
> +                * Devices may still have interrupts connected to those harts.
> +                * This is not wrong... unless they actually load a driver that needs it!
> +                */
> +               irq_domain_add_linear(node, 8*sizeof(uintptr_t), &riscv_irqdomain_ops_noop, node->parent);
> +       }
> +       return 0;
> +}
> +
> +IRQCHIP_DECLARE(riscv, "riscv,cpu-intc", riscv_intc_init);
> +
> +void __init init_IRQ(void)
> +{
> +       irqchip_init();
> +}

> diff --git a/arch/riscv/kernel/pci.c b/arch/riscv/kernel/pci.c
> new file mode 100644
> index 000000000000..4191a5ffdd67
> --- /dev/null
> +++ b/arch/riscv/kernel/pci.c
> @@ -0,0 +1,36 @@
> +/*
> + * Code borrowed from arch/arm64/kernel/pci.c

So, you should add recursive reference from there (i.e. powerpc).

But in the end, there's essentially no code in this file. :)


> + *
> + * Copyright (C) 2003 Anton Blanchard <anton@...ibm.com>, IBM
> + * Copyright (C) 2014 ARM Ltd.
> + * Copyright (C) 2017 SiFive
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * version 2 as published by the Free Software Foundation.
> + *
> + */
> +
> +#include <linux/init.h>
> +#include <linux/io.h>
> +#include <linux/kernel.h>
> +#include <linux/mm.h>
> +#include <linux/slab.h>
> +#include <linux/pci.h>
> +
> +/*
> + * Called after each bus is probed, but before its children are examined
> + */
> +void pcibios_fixup_bus(struct pci_bus *bus)
> +{
> +       /* nothing to do, expected to be removed in the future */
> +}
> +
> +/*
> + * We don't have to worry about legacy ISA devices, so nothing to do here
> + */
> +resource_size_t pcibios_align_resource(void *data, const struct resource *res,
> +                               resource_size_t size, resource_size_t align)
> +{
> +       return res->start;
> +}
> diff --git a/arch/riscv/kernel/plic.c b/arch/riscv/kernel/plic.c
> new file mode 100644
> index 000000000000..5b3d4241f4e2
> --- /dev/null
> +++ b/arch/riscv/kernel/plic.c
> @@ -0,0 +1,208 @@
> +/*
> + * Copyright (C) 2017 SiFive
> + *
> + *   This program is free software; you can redistribute it and/or
> + *   modify it under the terms of the GNU General Public License
> + *   as published by the Free Software Foundation, version 2.
> + *
> + *   This program is distributed in the hope that it will be useful, but
> + *   WITHOUT ANY WARRANTY; without even the implied warranty of
> + *   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
> + *   NON INFRINGEMENT.  See the GNU General Public License for
> + *   more details.
> + */
> +
> +#include <linux/interrupt.h>
> +#include <linux/io.h>
> +#include <linux/irq.h>
> +#include <linux/irqchip.h>
> +#include <linux/irqchip/chained_irq.h>
> +#include <linux/irqdomain.h>
> +#include <linux/module.h>
> +#include <linux/of.h>
> +#include <linux/of_address.h>
> +#include <linux/of_irq.h>
> +#include <linux/platform_device.h>
> +
> +#define MAX_DEVICES    1024 // 0 is reserved

Seems like an odd comment to have here (and should probably not go at
the end of the line)

> +#define MAX_CONTEXTS   15872
> +
> +#define PRIORITY_BASE  0
> +#define ENABLE_BASE    0x2000
> +#define ENABLE_SIZE    0x80
> +#define HART_BASE      0x200000
> +#define HART_SIZE      0x1000
> +
> +#define PLIC_HART_CONTEXT(data, i)     (struct plic_hart_context *)((char*)data->reg + HART_BASE + HART_SIZE*i)
> +#define PLIC_ENABLE_CONTEXT(data, i)   (struct plic_enable_context *)((char*)data->reg + ENABLE_BASE + ENABLE_SIZE*i)
> +#define PLIC_PRIORITY(data)            (struct plic_priority *)((char *)data->reg + PRIORITY_BASE)

Since you have typecasting and stuff here, small static inlines with
appropriate return types seems slightly tidier.

> +
> +struct plic_hart_context {
> +       volatile u32 threshold;
> +       volatile u32 claim;
> +};
> +
> +struct plic_enable_context {
> +       atomic_t mask[32]; // 32-bit * 32-entry
> +};
> +
> +struct plic_priority {
> +       volatile u32 prio[MAX_DEVICES];
> +};
> +
> +struct plic_data {
> +       struct irq_chip         chip;
> +       struct irq_domain       *domain;
> +       u32                     ndev;
> +       void __iomem            *reg;
> +       int                     handlers;
> +       struct plic_handler     *handler;
> +       char                    name[30];
> +};
> +
> +struct plic_handler {
> +       struct plic_hart_context        *context;
> +       struct plic_data                *data;
> +};
> +
> +static void plic_disable(struct plic_data *data, int i, int hwirq)
> +{
> +       struct plic_enable_context *enable = PLIC_ENABLE_CONTEXT(data, i);
> +       atomic_and(~(1 << (hwirq % 32)), &enable->mask[hwirq / 32]);
> +}
> +
> +static void plic_enable(struct plic_data *data, int i, int hwirq)
> +{
> +       struct plic_enable_context *enable = PLIC_ENABLE_CONTEXT(data, i);
> +       atomic_or((1 << (hwirq % 32)), &enable->mask[hwirq / 32]);
> +}
> +
> +// There is no need to mask/unmask PLIC interrupts
> +// They are "masked" by reading claim and "unmasked" when writing it back.
> +static void plic_irq_mask(struct irq_data *d) { }
> +static void plic_irq_unmask(struct irq_data *d) { }
> +
> +static void plic_irq_enable(struct irq_data *d)
> +{
> +       struct plic_data *data = irq_data_get_irq_chip_data(d);
> +       struct plic_priority *priority = PLIC_PRIORITY(data);
> +       int i;
> +       iowrite32(1, &priority->prio[d->hwirq]);
> +       for (i = 0; i < data->handlers; ++i)
> +               if (data->handler[i].context)
> +                       plic_enable(data, i, d->hwirq);
> +}
> +
> +static void plic_irq_disable(struct irq_data *d)
> +{
> +       struct plic_data *data = irq_data_get_irq_chip_data(d);
> +       struct plic_priority *priority = PLIC_PRIORITY(data);
> +       int i;
> +       iowrite32(0, &priority->prio[d->hwirq]);
> +       for (i = 0; i < data->handlers; ++i)
> +               if (data->handler[i].context)
> +                       plic_disable(data, i, d->hwirq);
> +}
> +
> +static int plic_irqdomain_map(struct irq_domain *d, unsigned int irq, irq_hw_number_t hwirq)
> +{
> +       struct plic_data *data = d->host_data;
> +
> +        irq_set_chip_and_handler(irq, &data->chip, handle_simple_irq);
> +        irq_set_chip_data(irq, data);
> +        irq_set_noprobe(irq);
> +
> +        return 0;
> +}
> +
> +static const struct irq_domain_ops plic_irqdomain_ops = {
> +       .map    = plic_irqdomain_map,
> +       .xlate  = irq_domain_xlate_onecell,
> +};
> +
> +static void plic_chained_handle_irq(struct irq_desc *desc)
> +{
> +        struct plic_handler *handler = irq_desc_get_handler_data(desc);
> +       struct irq_chip *chip = irq_desc_get_chip(desc);

Whitespace.

> +       struct irq_domain *domain = handler->data->domain;
> +       u32 what;
> +
> +       chained_irq_enter(chip, desc);
> +
> +       while ((what = ioread32(&handler->context->claim))) {
> +               int irq = irq_find_mapping(domain, what);
> +               if (irq > 0) {
> +                       generic_handle_irq(irq);
> +               } else {
> +                       handle_bad_irq(desc);
> +               }
> +               iowrite32(what, &handler->context->claim);
> +       }
> +
> +       chained_irq_exit(chip, desc);
> +}
> +
> +// TODO: add a /sys interface to set priority + per-hart enables for steering
> +
> +static int plic_init(struct device_node *node, struct device_node *parent)
> +{
> +       struct plic_data *data;
> +       struct resource resource;
> +       int i, ok = 0;
> +
> +       data = kzalloc(sizeof(*data), GFP_KERNEL);
> +       if (WARN_ON(!data)) return -ENOMEM;
> +
> +       data->reg = of_iomap(node, 0);
> +       if (WARN_ON(!data->reg)) return -EIO;
> +
> +       of_property_read_u32(node, "riscv,ndev", &data->ndev);
> +       if (WARN_ON(!data->ndev)) return -EINVAL;
> +
> +       data->handlers = of_irq_count(node);
> +       if (WARN_ON(!data->handlers)) return -EINVAL;
> +
> +       data->handler = kzalloc(sizeof(*data->handler)*data->handlers, GFP_KERNEL);
> +       if (WARN_ON(!data->handler)) return -ENOMEM;
> +
> +       data->domain = irq_domain_add_linear(node, data->ndev+1, &plic_irqdomain_ops, data);
> +       if (WARN_ON(!data->domain)) return -ENOMEM;
> +
> +       of_address_to_resource(node, 0, &resource);
> +       snprintf(data->name, sizeof(data->name), "riscv,plic0,%llx", resource.start);
> +       data->chip.name = data->name;
> +       data->chip.irq_mask = plic_irq_mask;
> +       data->chip.irq_unmask = plic_irq_unmask;
> +       data->chip.irq_enable = plic_irq_enable;
> +       data->chip.irq_disable = plic_irq_disable;
> +
> +       for (i = 0; i < data->handlers; ++i) {
> +               struct plic_handler *handler = &data->handler[i];
> +               struct of_phandle_args parent;
> +               int parent_irq, hwirq;
> +
> +               if (of_irq_parse_one(node, i, &parent)) continue;
> +               if (parent.args[0] == -1) continue; // skip context holes
> +
> +               // skip any contexts that lead to inactive harts
> +               if (of_device_is_compatible(parent.np, "riscv,cpu-intc") &&
> +                   parent.np->parent &&
> +                   riscv_of_processor_hart(parent.np->parent) < 0) continue;
> +
> +               parent_irq = irq_create_of_mapping(&parent);
> +               if (!parent_irq) continue;
> +
> +               handler->context = PLIC_HART_CONTEXT(data, i);
> +               handler->data = data;
> +               iowrite32(0, &handler->context->threshold); // hwirq prio must be > this to trigger an interrupt
> +               for (hwirq = 1; hwirq <= data->ndev; ++hwirq) plic_disable(data, i, hwirq);
> +               irq_set_chained_handler_and_data(parent_irq, plic_chained_handle_irq, handler);
> +               ++ok;
> +       }
> +
> +       printk("%s: mapped %d interrupts to %d/%d handlers\n", data->name, data->ndev, ok, data->handlers);
> +       WARN_ON(!ok);
> +       return 0;
> +}
> +
> +IRQCHIP_DECLARE(plic0, "riscv,plic0", plic_init);


[wrapping up review of this patch at this point to keep size down]


-Olof