linux-kernel - RE: liquidio vs smp_call_function_single

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <BYAPR18MB2423B06D1366DB7D32A82C13AC800@BYAPR18MB2423.namprd18.prod.outlook.com>
Date:   Thu, 11 Jun 2020 21:49:06 +0000
From:   Derek Chickles <dchickles@...vell.com>
To:     Peter Zijlstra <peterz@...radead.org>,
        Satananda Burla <sburla@...vell.com>,
        Felix Manlunas <fmanlunas@...vell.com>
CC:     "frederic@...nel.org" <frederic@...nel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "kuba@...nel.org" <kuba@...nel.org>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: RE: liquidio vs smp_call_function_single_async()

> From: Peter Zijlstra <peterz@...radead.org>
> Sent: Monday, June 8, 2020 6:05 AM
> To: Derek Chickles <dchickles@...vell.com>; Satananda Burla
> <sburla@...vell.com>; Felix Manlunas <fmanlunas@...vell.com>
> Cc: frederic@...nel.org; linux-kernel@...r.kernel.org;
> davem@...emloft.net; kuba@...nel.org; netdev@...r.kernel.org
> Subject: liquidio vs smp_call_function_single_async()
> 
> Hi,
> 
> I'm going through the smp_call_function_single_async() users, and stumbled
> over your liquidio thingy. It does:
> 
> 		call_single_data_t *csd = &droq->csd;
> 
> 		csd->func = napi_schedule_wrapper;
> 		csd->info = &droq->napi;
> 		csd->flags = 0;
> 
> 		smp_call_function_single_async(droq->cpu_id, csd);
> 
> which is almost certainly a bug. What guarantees that csd is unused when
> you do this? What happens, if the remote CPU is already running RX and
> consumes the packets before the IPI lands, and then this CPU gets another
> interrupt.
> 
> AFAICT you then call this thing again, causing list corruption.

Hi Peter,

I think you're right that this might be a functional bug, but it won't cause list
corruption. We don't rely on the IPI to process packets; only to move NAPI
processing to another CPU. There are separate register counters that indicate
if and how many new packets have arrived, that will be re-read once it
executes.

I think a patch to check if NAPI is already scheduled would address the
unexpected rescheduling issue here. Otherwise, it can probably live as is,
as there is no harm.
 
Thanks,
Derek