linux-kernel - Re: [PATCH net v2] octeontx2-pf: Fix page pool cache index corruption.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20230907070955.0kdmjXbB@linutronix.de>
Date:   Thu, 7 Sep 2023 09:09:55 +0200
From:   Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To:     Ratheesh Kannoth <rkannoth@...vell.com>
Cc:     netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
        sgoutham@...vell.com, gakula@...vell.com, sbhatta@...vell.com,
        hkelam@...vell.com, davem@...emloft.net, edumazet@...gle.com,
        kuba@...nel.org, pabeni@...hat.com, hawk@...nel.org,
        alexander.duyck@...il.com, ilias.apalodimas@...aro.org,
        linyunsheng@...wei.com
Subject: Re: [PATCH net v2] octeontx2-pf: Fix page pool cache index
 corruption.

On 2023-09-07 07:17:11 [+0530], Ratheesh Kannoth wrote:
> The access to page pool `cache' array and the `count' variable
> is not locked. Page pool cache access is fine as long as there
> is only one consumer per pool.
> 
> octeontx2 driver fills in rx buffers from page pool in NAPI context.
> If system is stressed and could not allocate buffers, refiiling work
> will be delegated to a delayed workqueue. This means that there are
> two cosumers to the page pool cache.
> 
> Either workqueue or IRQ/NAPI can be run on other CPU. This will lead
> to lock less access, hence corruption of cache pool indexes.
> 
> To fix this issue, NAPI is rescheduled from workqueue context to refill
> rx buffers.
> 
> Fixes: b2e3406a38f0 ("octeontx2-pf: Add support for page pool")
> Signed-off-by: Ratheesh Kannoth <rkannoth@...vell.com>

Reported-by: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@...utronix.de>

> diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
> index 8511906cb4e2..997fedac3a98 100644
> --- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
> +++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
>  static void otx2_pool_refill_task(struct work_struct *work)
>  {
>  	struct otx2_cq_queue *cq;
> -	struct otx2_pool *rbpool;
>  	struct refill_work *wrk;
> -	int qidx, free_ptrs = 0;
>  	struct otx2_nic *pfvf;
> -	dma_addr_t bufptr;
> +	int qidx;
>  
>  	wrk = container_of(work, struct refill_work, pool_refill_work.work);
>  	pfvf = wrk->pf;
>  	qidx = wrk - pfvf->refill_wrk;
>  	cq = &pfvf->qset.cq[qidx];
…
>  	cq->refill_task_sched = false;
> +
> +	local_bh_disable();
> +	napi_schedule(wrk->napi);
> +	local_bh_enable();

This is a nitpick since I haven't look how it works exactly: Is it
possible that the wrk->napi pointer gets overwritten by
otx2_napi_handler() since you cleared cq->refill_task_sched() earlier?

>  }
>  
>  int otx2_config_nix_queues(struct otx2_nic *pfvf)
> diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.c
> index e369baf11530..b778ed366f81 100644
> --- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.c
> +++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.c
> @@ -561,9 +565,24 @@ int otx2_napi_handler(struct napi_struct *napi, int budget)
>  				otx2_config_irq_coalescing(pfvf, i);
>  		}
>  
> -		/* Re-enable interrupts */
> -		otx2_write64(pfvf, NIX_LF_CINTX_ENA_W1S(cq_poll->cint_idx),
> -			     BIT_ULL(0));
> +		if (unlikely(!filled_cnt)) {
> +			struct refill_work *work;
> +			struct delayed_work *dwork;
> +
> +			work = &pfvf->refill_wrk[cq->cq_idx];
> +			dwork = &work->pool_refill_work;
> +			/* Schedule a task if no other task is running */
> +			if (!cq->refill_task_sched) {
> +				work->napi = napi;
> +				cq->refill_task_sched = true;
> +				schedule_delayed_work(dwork,
> +						      msecs_to_jiffies(100));
> +			}
> +		} else {
> +			/* Re-enable interrupts */
> +			otx2_write64(pfvf, NIX_LF_CINTX_ENA_W1S(cq_poll->cint_idx),
> +				     BIT_ULL(0));
> +		}
>  	}
>  	return workdone;
>  }

Sebastian