lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9ff86804-fe40-6e03-7ed4-6b431220e202@antgroup.com>
Date:   Fri, 10 Mar 2023 10:14:47 +0800
From:   "沈安琪(凛玥)" <amy.saq@...group.com>
To:     Willem de Bruijn <willemdebruijn.kernel@...il.com>,
        netdev@...r.kernel.org
Cc:     <mst@...hat.com>, <davem@...emloft.net>, <jasowang@...hat.com>,
        "谈鉴锋" <henry.tjf@...group.com>
Subject: Re: [PATCH v3] net/packet: support mergeable feature of virtio


在 2023/3/9 下午11:18, Willem de Bruijn 写道:
> 沈安琪(凛玥) wrote:
>> 在 2023/3/7 下午11:49, Willem de Bruijn 写道:
>>> 沈安琪(凛玥) wrote:
>>>> From: Jianfeng Tan <henry.tjf@...group.com>
>>>>
>>>> Packet sockets, like tap, can be used as the backend for kernel vhost.
>>>> In packet sockets, virtio net header size is currently hardcoded to be
>>>> the size of struct virtio_net_hdr, which is 10 bytes; however, it is not
>>>> always the case: some virtio features, such as mrg_rxbuf, need virtio
>>>> net header to be 12-byte long.
>>>>
>>>> Mergeable buffers, as a virtio feature, is worthy of supporting: packets
>>>> that are larger than one-mbuf size will be dropped in vhost worker's
>>>> handle_rx if mrg_rxbuf feature is not used, but large packets
>>>> cannot be avoided and increasing mbuf's size is not economical.
>>>>
>>>> With this mergeable feature enabled by virtio-user, packet sockets with
>>>> hardcoded 10-byte virtio net header will parse mac head incorrectly in
>>>> packet_snd by taking the last two bytes of virtio net header as part of
>>>> mac header.
>>>> This incorrect mac header parsing will cause packet to be dropped due to
>>>> invalid ether head checking in later under-layer device packet receiving.
>>>>
>>>> By adding extra field vnet_hdr_sz with utilizing holes in struct
>>>> packet_sock to record currently used virtio net header size and supporting
>>>> extra sockopt PACKET_VNET_HDR_SZ to set specified vnet_hdr_sz, packet
>>>> sockets can know the exact length of virtio net header that virtio user
>>>> gives.
>>>> In packet_snd, tpacket_snd and packet_recvmsg, instead of using
>>>> hardcoded virtio net header size, it can get the exact vnet_hdr_sz from
>>>> corresponding packet_sock, and parse mac header correctly based on this
>>>> information to avoid the packets being mistakenly dropped.
>>>>
>>>> Besides, has_vnet_hdr field in struct packet_sock is removed since all
>>>> the information it provides is covered by vnet_hdr_sz field: a packet
>>>> socket has a vnet header if and only if its vnet_hdr_sz is not zero.
>>>>
>>>> Signed-off-by: Jianfeng Tan <henry.tjf@...group.com>
>>>> Co-developed-by: Anqi Shen <amy.saq@...group.com>
>>>> Signed-off-by: Anqi Shen <amy.saq@...group.com>
>>>> ---
>>>> diff --git a/net/packet/internal.h b/net/packet/internal.h
>>>> index 48af35b..9b52d93 100644
>>>> --- a/net/packet/internal.h
>>>> +++ b/net/packet/internal.h
>>>> @@ -119,9 +119,9 @@ struct packet_sock {
>>>>    	unsigned int		running;	/* bind_lock must be held */
>>>>    	unsigned int		auxdata:1,	/* writer must hold sock lock */
>>>>    				origdev:1,
>>>> -				has_vnet_hdr:1,
>>>>    				tp_loss:1,
>>>> -				tp_tx_has_off:1;
>>>> +				tp_tx_has_off:1,
>>>> +				vnet_hdr_sz:8;
>>> just a separate u8 variable , rather than 8 bits in a u32.
>>>
>>>>    	int			pressure;
>>>>    	int			ifindex;	/* bound device		*/
>>
>> We plan to add
>>
>> +	   u8	vnet_hdr_sz:8;
>>
>> here.
>> Is this a proper place to add this field to make sure the cacheline will not be broken?
> When in doubt, use pahole (`pahole -C packet_sock net/packet/af_packet.o`).
>
> There currently is a 27-bit hole before pressure. That would be a good spot.


Thanks for the advice! We will try the tool.

Besides, we wonder whether it will be better to use unsigned char or u8 
here to be more consistent with other fields.


>>>>    	__be16			num;
>>>> -- 
>>>> 1.8.3.1
>>>>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ