lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Tue, 11 Sep 2018 11:36:32 -0700
From:   Atish Patra <atish.patra@....com>
To:     Mark Rutland <mark.rutland@....com>
Cc:     "palmer@...ive.com" <palmer@...ive.com>,
        "linux-riscv@...ts.infradead.org" <linux-riscv@...ts.infradead.org>,
        "hch@...radead.org" <hch@...radead.org>,
        "anup@...infault.org" <anup@...infault.org>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Damien Le Moal <Damien.LeMoal@....com>,
        "marc.zyngier@....com" <marc.zyngier@....com>,
        "jeremy.linton@....com" <jeremy.linton@....com>,
        "gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
        "jason@...edaemon.net" <jason@...edaemon.net>,
        "catalin.marinas@....com" <catalin.marinas@....com>,
        "dmitriy@...-tech.org" <dmitriy@...-tech.org>,
        "ard.biesheuvel@...aro.org" <ard.biesheuvel@...aro.org>
Subject: Re: [PATCH v3 12/12] RISC-V: Support cpu hotplug.

On 9/6/18 11:25 AM, Atish Patra wrote:
> On 9/6/18 3:21 AM, Mark Rutland wrote:
>> Hi,
>>
>> On Thu, Sep 06, 2018 at 01:05:35AM -0700, Atish Patra wrote:
>>> This patch enable support for cpu hotplug in RISC-V.
>>>
>>> In absence of generic cpu stop functions, WFI is used
>>> to put the cpu in low power state during offline. An IPI
>>> is sent to bring it out of WFI during online operation.
>>
>> AFAICT, this doesn't actually return the CPU to firwmare, and the WFI and
>> return code are in-kernel. From experience on arm and arm64 I would *very*
>> strongly advise against this kind of pseudo-hotplug, as it causes many more
>> long-term problems than it solves.
>>
> I completely agree with you. The idea was here to have a working
> cpu-hotplug solution until we have a concrete implementation of cpu
> stop/start via firmware. Once we have that, we just need to replace the
> wakeup and wait_for_software_interrupt calls. The current implementation
> via WFI was never meant to be a long term solution.
> 
> 

Any thoughts on this ? Is this good enough argument to keep the current 
version of hotplug feature and make necessary changes after we have a 
proper cpu start/stop via firmware (which may take some significant time)?

Regards,
Atish
>> For instance, this causes significant pain with kexec, since the prior kernel's
>> text must be kept around forever for the offline CPUs. Likewise with hibernate
>> suspend/resume.
>>
>> Depending on how your architecture works, there can also be a number of
>> problems with cache/TLB maintenance for secondary CPUs which are unexpectedly
>> online.
>>
>> I would suggest that someone should look at writing a FW standard for CPU
>> hotplug, akin to PSCI for arm/arm64. At minimum, you will want:
>>
> 
> I have already started working on it in parallel. As of now, I am
> planning to strengthen SBI spec by making it more flexible/extensible
> with some of the below features you mentioned. But we can come up with a
> new spec altogether if that makes more sense.
> 
>> - A mechanism to determine the version of FW and/or features supported by the FW.
>>
>> - a mechanism to hotplug-in a specific CPU to a runtime-specified address,
>>     ideally with some unique token (e.g. so you can pass a pointer to its stack
>>     or whatever).
>>
>> - a mechanism to hotplug-out the calling CPU.
>>
>> - a mechanism to determine when a specific CPU has been hotplugged out. This is
>>     necessary for kexec and other things to be robust.
>>
> 
> Thanks for the pointers. I had not looked into 2&4 earlier. I will go
> over PSCI docs again.
> 
> Regards,
> Atish
> 
>> Thanks,
>> Mark.
>>
>>>
>>> Tested both on QEMU and HighFive Unleashed board with
>>> 4 cpus. Test result follows.
>>>
>>> $ echo 0 > /sys/devices/system/cpu/cpu2/online
>>> [   31.828562] CPU2: shutdown
>>> $ cat /proc/cpuinfo
>>> hart    : 0
>>> isa     : rv64imafdc
>>> mmu     : sv39
>>> uarch   : sifive,rocket0
>>>
>>> hart    : 1
>>> isa     : rv64imafdc
>>> mmu     : sv39
>>> uarch   : sifive,rocket0
>>>
>>> hart    : 3
>>> isa     : rv64imafdc
>>> mmu     : sv39
>>> uarch   : sifive,rocket0
>>>
>>> $ echo 0 > /sys/devices/system/cpu/cpu3/online
>>> [   52.968495] CPU3: shutdown
>>> $ cat /proc/cpuinfo
>>> hart    : 0
>>> isa     : rv64imafdc
>>> mmu     : sv39
>>> uarch   : sifive,rocket0
>>>
>>> hart    : 2
>>> isa     : rv64imafdc
>>> mmu     : sv39
>>> uarch   : sifive,rocket0
>>>
>>> $ echo 1 > /sys/devices/system/cpu/cpu3/online
>>> [   64.298250] CPU3: online
>>> $ cat /proc/cpuinfo
>>> hart    : 0
>>> isa     : rv64imafdc
>>> mmu     : sv39
>>> uarch   : sifive,rocket0
>>>
>>> hart    : 1
>>> isa     : rv64imafdc
>>> mmu     : sv39
>>> uarch   : sifive,rocket0
>>>
>>> hart    : 3
>>> isa     : rv64imafdc
>>> mmu     : sv39
>>> uarch   : sifive,rocket0
>>>
>>> Signed-off-by: Atish Patra <atish.patra@....com>
>>> ---
>>>    arch/riscv/Kconfig              | 12 ++++++-
>>>    arch/riscv/include/asm/irq.h    |  1 +
>>>    arch/riscv/include/asm/smp.h    | 28 ++++++++++++++++
>>>    arch/riscv/kernel/Makefile      |  1 +
>>>    arch/riscv/kernel/cpu-hotplug.c | 72 +++++++++++++++++++++++++++++++++++++++++
>>>    arch/riscv/kernel/head.S        | 13 ++++++++
>>>    arch/riscv/kernel/irq.c         | 24 ++++++++++++++
>>>    arch/riscv/kernel/setup.c       | 17 +++++++++-
>>>    arch/riscv/kernel/smp.c         |  8 +----
>>>    arch/riscv/kernel/smpboot.c     |  7 ++--
>>>    arch/riscv/kernel/traps.c       |  6 ++--
>>>    11 files changed, 175 insertions(+), 14 deletions(-)
>>>    create mode 100644 arch/riscv/kernel/cpu-hotplug.c
>>>
>>> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
>>> index a3449802..ea2e617e 100644
>>> --- a/arch/riscv/Kconfig
>>> +++ b/arch/riscv/Kconfig
>>> @@ -21,7 +21,6 @@ config RISCV
>>>    	select COMMON_CLK
>>>    	select DMA_DIRECT_OPS
>>>    	select GENERIC_CLOCKEVENTS
>>> -	select GENERIC_CPU_DEVICES
>>>    	select GENERIC_IRQ_SHOW
>>>    	select GENERIC_PCI_IOMAP
>>>    	select GENERIC_STRNCPY_FROM_USER
>>> @@ -167,6 +166,17 @@ config SMP
>>>    
>>>    	  If you don't know what to do here, say N.
>>>    
>>> +config HOTPLUG_CPU
>>> +	bool "Support for hot-pluggable CPUs"
>>> +	depends on SMP
>>> +	select GENERIC_IRQ_MIGRATION
>>> +	help
>>> +
>>> +	  Say Y here to experiment with turning CPUs off and on.  CPUs
>>> +	  can be controlled through /sys/devices/system/cpu.
>>> +
>>> +	  Say N if you want to disable CPU hotplug.
>>> +
>>>    config NR_CPUS
>>>    	int "Maximum number of CPUs (2-32)"
>>>    	range 2 32
>>> diff --git a/arch/riscv/include/asm/irq.h b/arch/riscv/include/asm/irq.h
>>> index 996b6fbe..a873a72d 100644
>>> --- a/arch/riscv/include/asm/irq.h
>>> +++ b/arch/riscv/include/asm/irq.h
>>> @@ -19,6 +19,7 @@
>>>    
>>>    void riscv_timer_interrupt(void);
>>>    void riscv_software_interrupt(void);
>>> +void wait_for_software_interrupt(void);
>>>    
>>>    #include <asm-generic/irq.h>
>>>    
>>> diff --git a/arch/riscv/include/asm/smp.h b/arch/riscv/include/asm/smp.h
>>> index fce312ce..8145b865 100644
>>> --- a/arch/riscv/include/asm/smp.h
>>> +++ b/arch/riscv/include/asm/smp.h
>>> @@ -25,8 +25,29 @@
>>>    extern unsigned long __cpuid_to_hardid_map[NR_CPUS];
>>>    #define cpuid_to_hardid_map(cpu)    __cpuid_to_hardid_map[cpu]
>>>    
>>> +#if defined CONFIG_SMP && defined CONFIG_HOTPLUG_CPU
>>> +void arch_send_call_wakeup_ipi(int cpu);
>>> +bool can_hotplug_cpu(void);
>>> +#else
>>> +static inline bool can_hotplug_cpu(void)
>>> +{
>>> +	return 0;
>>> +}
>>> +static inline void arch_send_call_wakeup_ipi(int cpu) { }
>>> +#endif
>>> +
>>>    #ifdef CONFIG_SMP
>>>    
>>> +enum ipi_message_type {
>>> +	IPI_RESCHEDULE,
>>> +	IPI_CALL_FUNC,
>>> +	IPI_CALL_WAKEUP,
>>> +	IPI_MAX
>>> +};
>>> +
>>> +void send_ipi_message(const struct cpumask *to_whom,
>>> +		      enum ipi_message_type operation);
>>> +
>>>    /* SMP initialization hook for setup_arch */
>>>    void __init setup_smp(void);
>>>    
>>> @@ -45,6 +66,13 @@ void riscv_cpuid_to_hartid_mask(const struct cpumask *in, struct cpumask *out);
>>>     */
>>>    #define raw_smp_processor_id() (current_thread_info()->cpu)
>>>    
>>> +#ifdef CONFIG_HOTPLUG_CPU
>>> +int __cpu_disable(void);
>>> +void __cpu_die(unsigned int cpu);
>>> +void cpu_play_dead(void);
>>> +void boot_sec_cpu(void);
>>> +#endif /* CONFIG_HOTPLUG_CPU */
>>> +
>>>    #else
>>>    
>>>    static inline int riscv_hartid_to_cpuid(int hartid)
>>> diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
>>> index e1274fc0..62043204 100644
>>> --- a/arch/riscv/kernel/Makefile
>>> +++ b/arch/riscv/kernel/Makefile
>>> @@ -35,6 +35,7 @@ obj-$(CONFIG_SMP)		+= smpboot.o
>>>    obj-$(CONFIG_SMP)		+= smp.o
>>>    obj-$(CONFIG_MODULES)		+= module.o
>>>    obj-$(CONFIG_MODULE_SECTIONS)	+= module-sections.o
>>> +obj-$(CONFIG_HOTPLUG_CPU)	+= cpu-hotplug.o
>>>    
>>>    obj-$(CONFIG_FUNCTION_TRACER)	+= mcount.o ftrace.o
>>>    obj-$(CONFIG_DYNAMIC_FTRACE)	+= mcount-dyn.o
>>> diff --git a/arch/riscv/kernel/cpu-hotplug.c b/arch/riscv/kernel/cpu-hotplug.c
>>> new file mode 100644
>>> index 00000000..7a152972
>>> --- /dev/null
>>> +++ b/arch/riscv/kernel/cpu-hotplug.c
>>> @@ -0,0 +1,72 @@
>>> +// SPDX-License-Identifier: GPL-2.0
>>> +/*
>>> + * Copyright (C) 2018 Western Digital Corporation or its affiliates.
>>> + */
>>> +
>>> +#include <linux/kernel.h>
>>> +#include <linux/mm.h>
>>> +#include <linux/sched.h>
>>> +#include <linux/err.h>
>>> +#include <linux/irq.h>
>>> +#include <linux/cpu.h>
>>> +#include <linux/sched/hotplug.h>
>>> +#include <asm/irq.h>
>>> +#include <asm/sbi.h>
>>> +
>>> +bool can_hotplug_cpu(void)
>>> +{
>>> +	return true;
>>> +}
>>> +
>>> +void arch_cpu_idle_dead(void)
>>> +{
>>> +	cpu_play_dead();
>>> +}
>>> +
>>> +/*
>>> + * __cpu_disable runs on the processor to be shutdown.
>>> + */
>>> +int __cpu_disable(void)
>>> +{
>>> +	int ret = 0;
>>> +	unsigned int cpu = smp_processor_id();
>>> +
>>> +	set_cpu_online(cpu, false);
>>> +	irq_migrate_all_off_this_cpu();
>>> +
>>> +	return ret;
>>> +}
>>> +/*
>>> + * called on the thread which is asking for a CPU to be shutdown -
>>> + * waits until shutdown has completed, or it is timed out.
>>> + */
>>> +void __cpu_die(unsigned int cpu)
>>> +{
>>> +	if (!cpu_wait_death(cpu, 5)) {
>>> +		pr_err("CPU %u: didn't die\n", cpu);
>>> +		return;
>>> +	}
>>> +	pr_notice("CPU%u: shutdown\n", cpu);
>>> +	/*TODO: Do we need to verify is cpu is really dead */
>>> +}
>>> +
>>> +/*
>>> + * Called from the idle thread for the CPU which has been shutdown.
>>> + *
>>> + */
>>> +void cpu_play_dead(void)
>>> +{
>>> +	idle_task_exit();
>>> +
>>> +	(void)cpu_report_death();
>>> +
>>> +	/* Do not disable software interrupt to restart cpu after WFI */
>>> +	csr_clear(sie, SIE_STIE | SIE_SEIE);
>>> +	wait_for_software_interrupt();
>>> +	boot_sec_cpu();
>>> +}
>>> +
>>> +void arch_send_call_wakeup_ipi(int cpu)
>>> +{
>>> +	send_ipi_message(cpumask_of(cpu), IPI_CALL_WAKEUP);
>>> +}
>>> diff --git a/arch/riscv/kernel/head.S b/arch/riscv/kernel/head.S
>>> index 711190d4..07d9fb38 100644
>>> --- a/arch/riscv/kernel/head.S
>>> +++ b/arch/riscv/kernel/head.S
>>> @@ -153,6 +153,19 @@ relocate:
>>>    	j .Lsecondary_park
>>>    END(_start)
>>>    
>>> +#ifdef CONFIG_SMP
>>> +.section .text
>>> +.global boot_sec_cpu
>>> +
>>> +boot_sec_cpu:
>>> +	/* clear all pending flags */
>>> +	csrw sip, zero
>>> +	/* Mask all interrupts */
>>> +	csrw sie, zero
>>> +	fence
>>> +
>>> +	tail smp_callin
>>> +#endif
>>>    __PAGE_ALIGNED_BSS
>>>    	/* Empty zero page */
>>>    	.balign PAGE_SIZE
>>> diff --git a/arch/riscv/kernel/irq.c b/arch/riscv/kernel/irq.c
>>> index 0cfac48a..7e14b0d9 100644
>>> --- a/arch/riscv/kernel/irq.c
>>> +++ b/arch/riscv/kernel/irq.c
>>> @@ -53,6 +53,30 @@ asmlinkage void __irq_entry do_IRQ(struct pt_regs *regs, unsigned long cause)
>>>    	set_irq_regs(old_regs);
>>>    }
>>>    
>>> +/*
>>> + * This function doesn't return until a software interrupt is sent via IPI.
>>> + * Obviously, all the interrupts except software interrupt should be disabled
>>> + * before this function is called.
>>> + */
>>> +void wait_for_software_interrupt(void)
>>> +{
>>> +	unsigned long sipval, sieval, scauseval;
>>> +
>>> +	/* clear all pending flags */
>>> +	csr_write(sip, 0);
>>> +	/* clear any previous scause data */
>>> +	csr_write(scause, 0);
>>> +
>>> +	do {
>>> +		wait_for_interrupt();
>>> +		sipval = csr_read(sip);
>>> +		sieval = csr_read(sie);
>>> +		scauseval = csr_read(scause) & ~INTERRUPT_CAUSE_FLAG;
>>> +	/* only break if wfi returns for an enabled interrupt */
>>> +	} while ((sipval & sieval) == 0 &&
>>> +		 scauseval != INTERRUPT_CAUSE_SOFTWARE);
>>> +}
>>> +
>>>    void __init init_IRQ(void)
>>>    {
>>>    	irqchip_init();
>>> diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
>>> index a5fac1b7..612f0c21 100644
>>> --- a/arch/riscv/kernel/setup.c
>>> +++ b/arch/riscv/kernel/setup.c
>>> @@ -30,11 +30,11 @@
>>>    #include <linux/of_platform.h>
>>>    #include <linux/sched/task.h>
>>>    #include <linux/swiotlb.h>
>>> +#include <linux/smp.h>
>>>    
>>>    #include <asm/setup.h>
>>>    #include <asm/sections.h>
>>>    #include <asm/pgtable.h>
>>> -#include <asm/smp.h>
>>>    #include <asm/sbi.h>
>>>    #include <asm/tlbflush.h>
>>>    #include <asm/thread_info.h>
>>> @@ -82,6 +82,7 @@ EXPORT_SYMBOL(empty_zero_page);
>>>    /* The lucky hart to first increment this variable will boot the other cores */
>>>    atomic_t hart_lottery;
>>>    unsigned long boot_cpu_hartid;
>>> +static DEFINE_PER_CPU(struct cpu, cpu_devices);
>>>    
>>>    unsigned long __cpuid_to_hardid_map[NR_CPUS] = {
>>>    	[0 ... NR_CPUS-1] = INVALID_HARTID
>>> @@ -257,3 +258,17 @@ void __init setup_arch(char **cmdline_p)
>>>    	riscv_fill_hwcap();
>>>    }
>>>    
>>> +static int __init topology_init(void)
>>> +{
>>> +	int i;
>>> +
>>> +	for_each_possible_cpu(i) {
>>> +		struct cpu *cpu = &per_cpu(cpu_devices, i);
>>> +
>>> +		cpu->hotpluggable = can_hotplug_cpu();
>>> +		register_cpu(cpu, i);
>>> +	}
>>> +
>>> +	return 0;
>>> +}
>>> +subsys_initcall(topology_init);
>>> diff --git a/arch/riscv/kernel/smp.c b/arch/riscv/kernel/smp.c
>>> index 89d95866..629456bb 100644
>>> --- a/arch/riscv/kernel/smp.c
>>> +++ b/arch/riscv/kernel/smp.c
>>> @@ -32,12 +32,6 @@ static struct {
>>>    	unsigned long bits ____cacheline_aligned;
>>>    } ipi_data[NR_CPUS] __cacheline_aligned;
>>>    
>>> -enum ipi_message_type {
>>> -	IPI_RESCHEDULE,
>>> -	IPI_CALL_FUNC,
>>> -	IPI_MAX
>>> -};
>>> -
>>>    int riscv_hartid_to_cpuid(int hartid)
>>>    {
>>>    	int i = -1;
>>> @@ -94,7 +88,7 @@ void riscv_software_interrupt(void)
>>>    	}
>>>    }
>>>    
>>> -static void
>>> +void
>>>    send_ipi_message(const struct cpumask *to_whom, enum ipi_message_type operation)
>>>    {
>>>    	int cpuid, hartid;
>>> diff --git a/arch/riscv/kernel/smpboot.c b/arch/riscv/kernel/smpboot.c
>>> index f44ae780..a9138431 100644
>>> --- a/arch/riscv/kernel/smpboot.c
>>> +++ b/arch/riscv/kernel/smpboot.c
>>> @@ -92,9 +92,12 @@ int __cpu_up(unsigned int cpu, struct task_struct *tidle)
>>>    		  task_stack_page(tidle) + THREAD_SIZE);
>>>    	WRITE_ONCE(__cpu_up_task_pointer[hartid], tidle);
>>>    
>>> +	arch_send_call_wakeup_ipi(cpu);
>>>    	while (!cpu_online(cpu))
>>>    		cpu_relax();
>>>    
>>> +	pr_notice("CPU%u: online\n", cpu);
>>> +
>>>    	return 0;
>>>    }
>>>    
>>> @@ -105,7 +108,7 @@ void __init smp_cpus_done(unsigned int max_cpus)
>>>    /*
>>>     * C entry point for a secondary processor.
>>>     */
>>> -asmlinkage void __init smp_callin(void)
>>> +asmlinkage void smp_callin(void)
>>>    {
>>>    	struct mm_struct *mm = &init_mm;
>>>    
>>> @@ -115,7 +118,7 @@ asmlinkage void __init smp_callin(void)
>>>    
>>>    	trap_init();
>>>    	notify_cpu_starting(smp_processor_id());
>>> -	set_cpu_online(smp_processor_id(), 1);
>>> +	set_cpu_online(smp_processor_id(), true);
>>>    	/*
>>>    	 * Remote TLB flushes are ignored while the CPU is offline, so emit
>>>    	 * a local TLB flush right now just in case.
>>> diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
>>> index 24a9333d..8b331619 100644
>>> --- a/arch/riscv/kernel/traps.c
>>> +++ b/arch/riscv/kernel/traps.c
>>> @@ -153,7 +153,7 @@ int is_valid_bugaddr(unsigned long pc)
>>>    }
>>>    #endif /* CONFIG_GENERIC_BUG */
>>>    
>>> -void __init trap_init(void)
>>> +void trap_init(void)
>>>    {
>>>    	/*
>>>    	 * Set sup0 scratch register to 0, indicating to exception vector
>>> @@ -162,6 +162,6 @@ void __init trap_init(void)
>>>    	csr_write(sscratch, 0);
>>>    	/* Set the exception vector address */
>>>    	csr_write(stvec, &handle_exception);
>>> -	/* Enable all interrupts */
>>> -	csr_write(sie, -1);
>>> +	/* Enable all interrupts but timer interrupt*/
>>> +	csr_set(sie, SIE_SSIE | SIE_SEIE);
>>>    }
>>> -- 
>>> 2.7.4
>>>
>>
> 
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ