linux-kernel - Re: [PATCH] smp: Allow smp_call_function_single

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20191216203705.GV2844@hirez.programming.kicks-ass.net>
Date:   Mon, 16 Dec 2019 21:37:05 +0100
From:   Peter Zijlstra <peterz@...radead.org>
To:     Peter Xu <peterx@...hat.com>
Cc:     linux-kernel@...r.kernel.org,
        Marcelo Tosatti <mtosatti@...hat.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Nadav Amit <namit@...are.com>,
        Josh Poimboeuf <jpoimboe@...hat.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Subject: Re: [PATCH] smp: Allow smp_call_function_single_async() to insert
 locked csd

On Wed, Dec 11, 2019 at 11:29:25AM -0500, Peter Xu wrote:
> This is also true.
> 
> Here's the statistics I mentioned:
> 
> =================================================
> 
> (1) Implemented the same counter mechanism on the caller's:
> 
> *** arch/mips/kernel/smp.c:
> tick_broadcast[713]            smp_call_function_single_async(cpu, csd);
> *** drivers/cpuidle/coupled.c:
> cpuidle_coupled_poke[336]      smp_call_function_single_async(cpu, csd);
> *** kernel/sched/core.c:
> hrtick_start[298]              smp_call_function_single_async(cpu_of(rq), &rq->hrtick_csd);
> 
> (2) Cleared the csd flags before calls:
> 
> *** arch/s390/pci/pci_irq.c:
> zpci_handle_fallback_irq[185]  smp_call_function_single_async(cpu, &cpu_data->csd);
> *** block/blk-mq.c:
> __blk_mq_complete_request[622] smp_call_function_single_async(ctx->cpu, &rq->csd);
> *** block/blk-softirq.c:
> raise_blk_irq[70]              smp_call_function_single_async(cpu, data);
> *** drivers/net/ethernet/cavium/liquidio/lio_core.c:
> liquidio_napi_drv_callback[735] smp_call_function_single_async(droq->cpu_id, csd);
> 
> (3) Others:
> 
> *** arch/mips/kernel/process.c:
> raise_backtrace[713]           smp_call_function_single_async(cpu, csd);

per-cpu csd data, seems perfectly fine usage.

> *** arch/x86/kernel/cpuid.c:
> cpuid_read[85]                 err = smp_call_function_single_async(cpu, &csd);
> *** arch/x86/lib/msr-smp.c:
> rdmsr_safe_on_cpu[182]         err = smp_call_function_single_async(cpu, &csd);

These two have csd on stack and wait with a completion. seems fine.

> *** include/linux/smp.h:
> bool[60]                       int smp_call_function_single_async(int cpu, call_single_data_t *csd);

this is the declaration, your grep went funny

> *** kernel/debug/debug_core.c:
> kgdb_roundup_cpus[272]         ret = smp_call_function_single_async(cpu, csd);
> *** net/core/dev.c:
> net_rps_send_ipi[5818]         smp_call_function_single_async(remsd->cpu, &remsd->csd);

Both percpu again.

> 
> =================================================
> 
> For (1): These probably justify more on that we might want a patch
>          like this to avoid reimplementing it everywhere.

I can't quite parse that, but if you're saying we should fix the
callers, then I agree.

> For (2): If I read it right, smp_call_function_single_async() is the
>          only place where we take a call_single_data_t structure
>          rather than the (smp_call_func_t, void *) tuple.

That's on purpose; by supplying csd we allow explicit concurrency. If
you do as proposed here:

>		I could
>          miss something important, but otherwise I think it would be
>          good to use the tuple for smp_call_function_single_async() as
>          well, then we move call_single_data_t out of global header
>          but move into smp.c to avoid callers from toucing it (which
>          could be error-prone).  In other words, IMHO it would be good
>          to have all these callers fixed.

Then you could only ever have 1 of then in flight at the same time.
Which would break things.

> For (3): I didn't dig, but I think some of them (or future users)
>          could still suffer from the same issue on retriggering the
>          WARN_ON... 

They all seem fine.

So I'm thinking your patch is good, but please also fix all 1).