netdev - Re: [PATCH net v3 2/2] mlx5: fix possible ptp queue fifo use-after-free

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <62066999-6a6b-5084-96ba-50c566e826c9@meta.com>
Date:   Thu, 26 Jan 2023 13:22:39 +0000
From:   Vadim Fedorenko <vadfed@...a.com>
To:     Tariq Toukan <ttoukan.linux@...il.com>,
        Aya Levin <ayal@...dia.com>,
        Saeed Mahameed <saeedm@...dia.com>,
        Jakub Kicinski <kuba@...nel.org>, Gal Pressman <gal@...dia.com>
CC:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        Tariq Toukan <tariqt@...dia.com>
Subject: Re: [PATCH net v3 2/2] mlx5: fix possible ptp queue fifo
 use-after-free

On 26/01/2023 06:53, Tariq Toukan wrote:
> 
> 
> On 26/01/2023 3:02, Vadim Fedorenko wrote:
>> From: Vadim Fedorenko <vadfed@...a.com>
>>
>> Fifo pointers were not checked during push and pop operations and this
>> could potentially lead to use-after-free or skb leak under heavy PTP
>> traffic.
>>
>> Also there were OOO cqe spotted which lead to drain of the queue and
>> use-after-free because of lack of fifo pointers check. Special check
>> is added to avoid resync operation if SKB could not exist in the fifo
>> because of OOO cqe (skb_id must be between consumer and producer index).
>>
> 
> Hi,
> 
> Let's hold on with this patch.
> I don't think we understand the root cause. I'm also not sure this patch 
> doesn't degrade the successful flow. See comment below.
> 
> We don't expect an xmit operation coming from the kernel while the TXQ 
> is stopped. This might be the reason for the fifo overflow. Does it 
> happen? If so, let's understand why and fix.
> 
> Your fix to mlx5e_skb_fifo_has_room() should help with preventing the 
> fifo overflow. Does the issue still occur even after your patch [1]?

Well, I do agree that there should be no overflow after the first patch. 
I added WARN_ONCE just to be sure that future changes will not trigger
overflow. But I'm OK to remove it if we are confident enough.

> Also, it's not easy to decisively determine that a CQE arrived OOO. I 
> doubt this can happen. The SQ is cyclic and works in-order. It's more 
> probably a full cycle of lost CQEs.

I have shown logs of OOO CQEs in previous version, but I can show it 
once again:

<idle>-0       [000] ..s..  2306.825713: mlx5e_ptp_ts_cqe_drop: mlx5: 
ptp ts cqe drop detected, skb_cc = 185, skb_id = 186
<idle>-0       [000] ..s..  2306.825719: 
mlx5e_ptp_skb_fifo_ts_cqe_resync: mlx5: ptp ts cqe resync, skb_cc = 186, 
skb_id = 186, cpuid = 8
<idle>-0       [000] ..s..  2306.825730: mlx5e_ptp_handle_ts_cqe: mlx5: 
ptp handle ts cqe, skb_cc = 187, skb_id = 185
<idle>-0       [000] ..s..  2306.825730: mlx5e_ptp_ts_cqe_drop: mlx5: 
ptp ts cqe drop detected, skb_cc = 187, skb_id = 185
<idle>-0       [000] ..s..  2306.825747: mlx5e_ptp_handle_ts_cqe: mlx5: 
ptp handle ts cqe, skb_cc = 187, skb_id = 187
<idle>-0       [000] ..s..  2306.825932: mlx5e_ptp_handle_ts_cqe: mlx5: 
ptp handle ts cqe, skb_cc = 188, skb_id = 188
<idle>-0       [000] ..s..  2306.825948: mlx5e_ptp_handle_ts_cqe: mlx5: 
ptp handle ts cqe, skb_cc = 189, skb_id = 189
<idle>-0       [000] ..s..  2306.825965: mlx5e_ptp_handle_ts_cqe: mlx5: 
ptp handle ts cqe, skb_cc = 190, skb_id = 190

In this example skb_cc is masked value, not the full value of the 
counter, but it still shows the problem. We can see that CQE with 
skb_id=186 has arrived when skb_cc was 185. That triggered resync, which 
flushed skb_id 185 from the queue. Then skb_id 186 was processed and 
after that skb_id 185 has arrived out-of-order. With current patch 
applied, this OOO CQE was simply skipped in resync. Then skb_ids 
187,188,189 and 190 have arrived in order.

Without current patch applied, second resync (when triggered by skb_id 
185) will trash all SKBs in the queue because there is no such id in the 
queue. An even more, without frist patch applied, it will trash all the 
queue until cc index overlaps and gets to the requested SKB. But this 
leads to use-after-free (skb_tstamp_tx(skb, &hwts)) and double-free for 
the last element later in napi_consume_skb.

If need more information I'm happy to gather it.

> 
> BTW, what value do you see in your environment for
> MLX5_CAP_GEN_2(mdev, ts_cqe_metadata_size2wqe_counter) ?

In our setup: ts_cqe_metadata_size2wqe_counter = 8
> 
> Thanks,
> Tariq
> 
> [1] [PATCH net v3 1/2] mlx5: fix skb leak while fifo resync and push
> 
>> Fixes: 58a518948f60 ("net/mlx5e: Add resiliency for PTP TX port 
>> timestamp")
>> Signed-off-by: Vadim Fedorenko <vadfed@...a.com>
>> ---
>>   .../net/ethernet/mellanox/mlx5/core/en/ptp.c  | 23 ++++++++++++++-----
>>   .../net/ethernet/mellanox/mlx5/core/en/txrx.h |  7 +++++-
>>   2 files changed, 23 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c 
>> b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
>> index b72de2b520ec..4ac7483dcbcc 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
>> @@ -86,7 +86,7 @@ static bool mlx5e_ptp_ts_cqe_drop(struct mlx5e_ptpsq 
>> *ptpsq, u16 skb_cc, u16 skb
>>       return (ptpsq->ts_cqe_ctr_mask && (skb_cc != skb_id));
>>   }
>> -static void mlx5e_ptp_skb_fifo_ts_cqe_resync(struct mlx5e_ptpsq 
>> *ptpsq, u16 skb_cc,
>> +static bool mlx5e_ptp_skb_fifo_ts_cqe_resync(struct mlx5e_ptpsq 
>> *ptpsq, u16 skb_cc,
>>                            u16 skb_id, int budget)
>>   {
>>       struct skb_shared_hwtstamps hwts = {};
>> @@ -94,14 +94,23 @@ static void 
>> mlx5e_ptp_skb_fifo_ts_cqe_resync(struct mlx5e_ptpsq *ptpsq, u16 skb_
>>       ptpsq->cq_stats->resync_event++;
>> -    while (skb_cc != skb_id) {
>> -        skb = mlx5e_skb_fifo_pop(&ptpsq->skb_fifo);
>> +    if (skb_cc > skb_id || PTP_WQE_CTR2IDX(ptpsq->skb_fifo_pc) < skb_id)
> 
> This can give false positives near the edge of the fifo (wraparound).

Can you please provide values that will trigger false positive here? I 
explained the reasoning of such if statement to Jakub in the previous 
version and I'm happy to improve this check.

> 
>> +        pr_err_ratelimited("mlx5e: out-of-order ptp cqe\n");
>> +        return false;
>> +    }
>> +
>> +    while (skb_cc != skb_id && (skb = 
>> mlx5e_skb_fifo_pop(&ptpsq->skb_fifo))) {
>>           hwts.hwtstamp = mlx5e_skb_cb_get_hwts(skb)->cqe_hwtstamp;
>>           skb_tstamp_tx(skb, &hwts);
>>           ptpsq->cq_stats->resync_cqe++;
>>           napi_consume_skb(skb, budget);
>>           skb_cc = PTP_WQE_CTR2IDX(ptpsq->skb_fifo_cc);
>>       }
>> +
>> +    if (!skb)
>> +        return false;
>> +
>> +    return true;
>>   }
>>   static void mlx5e_ptp_handle_ts_cqe(struct mlx5e_ptpsq *ptpsq,
>> @@ -111,7 +120,7 @@ static void mlx5e_ptp_handle_ts_cqe(struct 
>> mlx5e_ptpsq *ptpsq,
>>       u16 skb_id = PTP_WQE_CTR2IDX(be16_to_cpu(cqe->wqe_counter));
>>       u16 skb_cc = PTP_WQE_CTR2IDX(ptpsq->skb_fifo_cc);
>>       struct mlx5e_txqsq *sq = &ptpsq->txqsq;
>> -    struct sk_buff *skb;
>> +    struct sk_buff *skb = NULL;
>>       ktime_t hwtstamp;
>>       if (unlikely(MLX5E_RX_ERR_CQE(cqe))) {
>> @@ -120,8 +129,10 @@ static void mlx5e_ptp_handle_ts_cqe(struct 
>> mlx5e_ptpsq *ptpsq,
>>           goto out;
>>       }
>> -    if (mlx5e_ptp_ts_cqe_drop(ptpsq, skb_cc, skb_id))
>> -        mlx5e_ptp_skb_fifo_ts_cqe_resync(ptpsq, skb_cc, skb_id, budget);
>> +    if (mlx5e_ptp_ts_cqe_drop(ptpsq, skb_cc, skb_id) &&
>> +        !mlx5e_ptp_skb_fifo_ts_cqe_resync(ptpsq, skb_cc, skb_id, 
>> budget)) {
>> +        goto out;
>> +    }
>>       skb = mlx5e_skb_fifo_pop(&ptpsq->skb_fifo);
>>       hwtstamp = mlx5e_cqe_ts_to_ns(sq->ptp_cyc2time, sq->clock, 
>> get_cqe_ts(cqe));
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h 
>> b/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h
>> index 15a5a57b47b8..6e559b856afb 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h
>> @@ -289,14 +289,19 @@ struct sk_buff **mlx5e_skb_fifo_get(struct 
>> mlx5e_skb_fifo *fifo, u16 i)
>>   static inline
>>   void mlx5e_skb_fifo_push(struct mlx5e_skb_fifo *fifo, struct sk_buff 
>> *skb)
>>   {
>> -    struct sk_buff **skb_item = mlx5e_skb_fifo_get(fifo, (*fifo->pc)++);
>> +    struct sk_buff **skb_item;
>> +    WARN_ONCE(mlx5e_skb_fifo_has_room(fifo), "ptp fifo overflow");
>> +    skb_item = mlx5e_skb_fifo_get(fifo, (*fifo->pc)++);
>>       *skb_item = skb;
>>   }
>>   static inline
>>   struct sk_buff *mlx5e_skb_fifo_pop(struct mlx5e_skb_fifo *fifo)
>>   {
>> +    if (*fifo->pc == *fifo->cc)
>> +        return NULL;
>> +
>>       return *mlx5e_skb_fifo_get(fifo, (*fifo->cc)++);
>>   }