[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f0a34514-19da-4c73-9cd4-ae220fed6447@kernel.org>
Date: Mon, 15 Sep 2025 19:47:00 +0900
From: Vincent Mailhol <mailhol@...nel.org>
To: Oliver Hartkopp <socketcan@...tkopp.net>,
Marc Kleine-Budde <mkl@...gutronix.de>
Cc: linux-can@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/3] can: raw: use bitfields to store flags in struct
raw_sock
On 15/09/2025 at 19:16, Oliver Hartkopp wrote:
> On 15.09.25 11:23, Vincent Mailhol wrote:
>> The loopback, recv_own_msgs, fd_frames and xl_frames fields of struct
>> raw_sock just need to store one bit of information.
>>
>> Declare all those members as a bitfields of type unsigned int and
>> width one bit.
>>
>> Add a temporary variable to raw_setsockopt() and raw_getsockopt() to
>> make the conversion between the stored bits and the socket interface.
>>
>> This reduces struct raw_sock by eight bytes.
>>
>> Statistics before:
>>
>> $ pahole --class_name=raw_sock net/can/raw.o
>> struct raw_sock {
>> struct sock sk __attribute__((__aligned__(8))); /*
>> 0 776 */
>>
>> /* XXX last struct has 1 bit hole */
>>
>> /* --- cacheline 12 boundary (768 bytes) was 8 bytes ago --- */
>> int bound; /* 776 4 */
>> int ifindex; /* 780 4 */
>> struct net_device * dev; /* 784 8 */
>> netdevice_tracker dev_tracker; /* 792 0 */
>> struct list_head notifier; /* 792 16 */
>> int loopback; /* 808 4 */
>> int recv_own_msgs; /* 812 4 */
>> int fd_frames; /* 816 4 */
>> int xl_frames; /* 820 4 */
>> struct can_raw_vcid_options raw_vcid_opts; /* 824 4 */
>> canid_t tx_vcid_shifted; /* 828 4 */
>> /* --- cacheline 13 boundary (832 bytes) --- */
>> canid_t rx_vcid_shifted; /* 832 4 */
>> canid_t rx_vcid_mask_shifted; /* 836 4 */
>> int join_filters; /* 840 4 */
>> int count; /* 844 4 */
>> struct can_filter dfilter; /* 848 8 */
>> struct can_filter * filter; /* 856 8 */
>> can_err_mask_t err_mask; /* 864 4 */
>>
>> /* XXX 4 bytes hole, try to pack */
>>
>> struct uniqframe * uniq; /* 872 8 */
>>
>> /* size: 880, cachelines: 14, members: 20 */
>> /* sum members: 876, holes: 1, sum holes: 4 */
>> /* member types with bit holes: 1, total: 1 */
>> /* forced alignments: 1 */
>> /* last cacheline: 48 bytes */
>> } __attribute__((__aligned__(8)));
>>
>> ...and after:
>>
>> $ pahole --class_name=raw_sock net/can/raw.o
>> struct raw_sock {
>> struct sock sk __attribute__((__aligned__(8))); /*
>> 0 776 */
>>
>> /* XXX last struct has 1 bit hole */
>>
>> /* --- cacheline 12 boundary (768 bytes) was 8 bytes ago --- */
>> int bound; /* 776 4 */
>> int ifindex; /* 780 4 */
>> struct net_device * dev; /* 784 8 */
>> netdevice_tracker dev_tracker; /* 792 0 */
>> struct list_head notifier; /* 792 16 */
>> unsigned int loopback:1; /* 808: 0 4 */
>> unsigned int recv_own_msgs:1; /* 808: 1 4 */
>> unsigned int fd_frames:1; /* 808: 2 4 */
>> unsigned int xl_frames:1; /* 808: 3 4 */
>
> This means that the former data structures (int) are not copied but bits are set
> (shifted, ANDed, ORed, etc) right?
>
> So what's the difference in the code the CPU has to process for this
> improvement? Is implementing this bitmap more efficient or similar to copy the
> (unsigned ints) as-is?
It will indeed have to add a couple assembly instructions. But this is peanuts.
In the best case, the out of order execution might very well optimize this so
that not even a CPU tick is wasted. In the worst case, it is a couple CPU ticks.
On the other hands, reducing the size by 16 bytes lowers the risk to have a
cache miss. And removing one cache miss outperforms by an order of magnitude the
penalty of adding a couple assembly instructions.
Well, I did not benchmark it, but this is a commonly accepted trade off.
Yours sincerely,
Vincent Mailhol
Powered by blists - more mailing lists