[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <914ddb6c-79ca-49c2-82b1-33be1986eff5@infotecs.ru>
Date: Tue, 7 Oct 2025 13:34:11 +0000
From: Ilia Gavrilov <Ilia.Gavrilov@...otecs.ru>
To: Magnus Karlsson <magnus.karlsson@...il.com>, Alexander Lobakin
<aleksander.lobakin@...el.com>
CC: Song Yoong Siang <yoong.siang.song@...el.com>, Maciej Fijalkowski
<maciej.fijalkowski@...el.com>, Jesper Dangaard Brouer <hawk@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>, "netdev@...r.kernel.org"
<netdev@...r.kernel.org>, John Fastabend <john.fastabend@...il.com>, "Alexei
Starovoitov" <ast@...nel.org>, "stable@...r.kernel.org"
<stable@...r.kernel.org>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, Eric Dumazet <edumazet@...gle.com>,
"Stanislav Fomichev" <sdf@...ichev.me>, Simon Horman <horms@...nel.org>,
Jakub Kicinski <kuba@...nel.org>, "bpf@...r.kernel.org"
<bpf@...r.kernel.org>, Paolo Abeni <pabeni@...hat.com>, "David S. Miller"
<davem@...emloft.net>, Magnus Karlsson <magnus.karlsson@...el.com>,
"lvc-project@...uxtesting.org" <lvc-project@...uxtesting.org>
Subject: Re: [lvc-project] [PATCH net] xsk: Fix overflow in descriptor
validation@@
On 10/7/25 15:44, Magnus Karlsson wrote:
> On Tue, 7 Oct 2025 at 14:11, Alexander Lobakin
> <aleksander.lobakin@...el.com> wrote:
>>
>> From: Ilia Gavrilov <Ilia.Gavrilov@...otecs.ru>
>> Date: Tue, 7 Oct 2025 11:19:19 +0000
>>
>>> On 10/6/25 18:19, Alexander Lobakin wrote:
>>>> From: Ilia Gavrilov <Ilia.Gavrilov@...otecs.ru>
>>>> Date: Mon, 6 Oct 2025 08:53:17 +0000
>>>>
>>>>> The desc->len value can be set up to U32_MAX. If umem tx_metadata_len
>>>>
>>>> In theory. Never in practice.
>>>>
>>>
>>> Hi Alexander,
>>> Thank you for the review.
>>>
>>> It seems to me that this problem should be considered not from the point of view of practical use,
>>> but from the point of view of security. An attacker can set any length of the packet in the descriptor
>>> from the user space and descriptor validation will pass.
>>>
>>>
>>>>> option is also set, then the value of the expression
>>>>> 'desc->len + pool->tx_metadata_len' can overflow and validation
>>>>> of the incorrect descriptor will be successfully passed.
>>>>> This can lead to a subsequent chain of arithmetic overflows
>>>>> in the xsk_build_skb() function and incorrect sk_buff allocation.
>>>>>
>>>>> Found by InfoTeCS on behalf of Linux Verification Center
>>>>> (linuxtesting.org) with SVACE.
>>>>
>>>> I think the general rule for sending fixes is that a fix must fix a real
>>>> bug which can be reproduced in real life scenarios.
>>>
>>> I agree with that, so I make a test program (PoC). Something like that:
>>>
>>> struct xdp_umem_reg umem_reg;
>>> umem_reg.addr = (__u64)(void *)umem;
>>> ...
>>> umem_reg.chunk_size = 4096;
>>> umem_reg.tx_metadata_len = 16;
>>> umem_reg.flags = XDP_UMEM_TX_METADATA_LEN;
>>> setsockopt(sfd, SOL_XDP, XDP_UMEM_REG, &umem_reg, sizeof(umem_reg));
>>> ...
>>>
>>> xsk_ring_prod__reserve(tq, batch_size, &idx);
>>>
>>> for (i = 0; i < nr_packets; ++i) {
>>> struct xdp_desc *tx_desc = xsk_ring_prod__tx_desc(tq, idx + i);
>>> tx_desc->addr = packets[i].addr;
>>> tx_desc->addr += umem->tx_metadata_len;
>>> tx_desc->options = XDP_TX_METADATA;
>>> tx_desc->len = UINT32_MAX;
>>> }
>>>
>>> xsk_ring_prod__submit(tq, nr_packets);
>>> ...
>>> sendto(sfd, NULL, 0, MSG_DONTWAIT, NULL, 0);
>>>
>>> Since the check of an invalid descriptor has passed, kernel try to allocate
>>> a skb with size of 'hr + len + tr' in the sock_alloc_send_pskb() function
>>> and this is where the next overflow occurs.
>>> skb allocates with a size of 63. Next the skb_put() is called, which adds U32_MAX to skb->tail and skb->end.
>>> Next the skb_store_bits() tries to copy -1 bytes, but fails.
>>>
>>> __xsk_generic_xmit
>>> xsk_build_skb
>>> len = desc->len; // from descriptor
>>> sock_alloc_send_skb(..., hr + len + tr, ...) // the next overflow
>>> sock_alloc_send_pskb
>>> alloc_skb_with_frags
>>> skb_put(skb, len) // len casts to int
>>> skb_store_bits(skb, 0, buffer, len)
>>
>> Oh, so you actually have a repro for this. This is good. I suggest you
>> resubmitting the patch and include this repro in the commit message, so
>> that it will be clear that it's actually possible to trigger the problem
>> in the kernel using a malicious/broken userspace application.
>>
I'll add the repro from this e-mail in the next patch version,
the full source is too long.
>> (also pls remove those double `@@` from the subject next time)
>>
ok
>> I'd also like to hear from Maciej and/or others what they think about
>> this problem (that the userspace can set packet len to U32_MAX). Should
>> we just go with this proposed u64 propagation or maybe we need to limit
>> the maximum length which could be sent from the userspace?
>
> I prefer that we do not set a limit on it and go with the proposed
> solution since I do not know what a future proof size limit would be.
> Somebody could come up with a new virtual device that can send really
> large packets, who knows.
>
The limit is already checked in the xp_aligned_validate_desc() function:
static inline bool xp_aligned_validate_desc(struct xsk_buff_pool *pool,
struct xdp_desc *desc)
{
...
if (offset + len > pool->chunk_size)
return false;
...
}
and pool->chunk_size can't be more than PAGE_SIZE:
static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
{
u32 chunk_size = mr->chunk_size, headroom = mr->headroom;
...
if (chunk_size < XDP_UMEM_MIN_CHUNK_SIZE || chunk_size > PAGE_SIZE) {
/* Strictly speaking we could support this, if:
* - huge pages, or*
* - using an IOMMU, or
* - making sure the memory area is consecutive
* but for now, we simply say "computer says no".
*/
return -EINVAL;
}
...
}
The problem is exactly the overflow.
>> In any case, you raised a good topic.
>>
>>>
>>>> Static Analysis Tools have no idea that nobody sends 4 Gb sized network
>>>> packets.
>>>>
>>>
>>> That's right. Static analyzer is only a tool, but in this case, the overflow
>>> highlighted by the static analyzer can be used for malicious purposes.
>>
>> +1
>>
>> Also I really do hope Infotecs stayed independent from the govs and
>> doesn't take part in any dual-purpose/gov-related projects.
>>
>> Thanks,
>> Olek
>>
Powered by blists - more mailing lists