linux-kernel - Re: [PATCH v1] can: can327: Clean up payload encoding in can327_handle

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4621cc30-92d7-4b07-8058-a1d677f28135@enpas.org>
Date: Wed, 20 Nov 2024 02:15:07 +0900
From: Max Staudt <max@...as.org>
To: Vincent Mailhol <mailhol.vincent@...adoo.fr>,
 Dan Carpenter <dan.carpenter@...aro.org>,
 Marc Kleine-Budde <mkl@...gutronix.de>
Cc: linux-can@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v1] can: can327: Clean up payload encoding in
 can327_handle_prompt()

On 11/19/24 16:41, Vincent Mailhol wrote:
> Reviewed-by: Vincent Mailhol <mailhol.vincent@...adoo.fr>
> 
> I left comments on the comments. If you have time, it would be wonderful
> if your comment on start_xmit() could be moved to can_dev_dropped_skb()
> in a separate patch.
> 
> The code is good, so if such rework is not feasible, I am happy to take
> it as-is.

Thanks for the review! Yes, if you are fine with taking it, I suggest 
doing so now, and if I ever get to rework the comments, then I can look 
into this as well.


The thing is, when writing this patch, I tried to put myself in Dan's 
shoes. That is, someone unfamiliar with the code who is now looking at 
it. There's some string manipulation going on (uh-oh!), and we need to 
ensure that no buffers are overrun. Naturally, as a reviewer, I'd like 
to know that indeed, frame->len <= 8. Hence why I added a comment to 
that effect.

But security is not about trusting me, it's about proof. That's where 
the comment in _xmit() comes in. And I have to say, while it's easy to 
see that can_dev_dropped_skb() ensures len <= 8 in case of ETH_P_CAN, 
the guarantee that we only receive ETH_P_CAN packets in the first place 
is a whole different story. The MTU checks took a while for me to track 
down and understand, so my intention is to leave some breadcrumbs for 
the next poor soul digging into can327. Frankly, I think that's a sign 
that the otherwise lean CAN stack has finally accumulated some tech debt 
(or maybe I'm just bad at reading code), but oh well.

Now, as you say, this is sort of valid for all CAN drivers, and hence it 
should be in some centralised documentation. Agreed. But then again, for 
this driver, I wanted to make a point that we're dealing with ETH_P_CAN 
only, so the comment is also written that way. If you really don't want 
that there, I can shorten it and move it, but if it's okay as-is, then 
let's leave it in, and at a later stage, we can use it as a template for 
some more generic documentation :)


>> -			snprintf(&local_txbuf[2 * i], sizeof(local_txbuf),
>> -				 "\r");
>> +			/* Send a regular CAN data frame.
>> +			 *
>> +			 * frame->len is guaranteed to be <= 8. Please refer
>> +			 * to the comment in can327_netdev_start_xmit().
>> +			 */
> 
> Nitpick, could be less verbose, e.g.:
> 
>    /* frame->len <= 8 enforced in can327_netdev_start_xmit() */

IMHO that's a matter of point of view - for you it may be obvious that 
the CAN stack only passes ETH_P_CAN to _xmit(), but to someone new it is 
not. And this particular detail is *not* enforced in _xmit(), but in 
can_send() in the CAN stack. I hope you can understand my wish to be 
precise about this comment, in order to avoid confusion when inevitably 
someone else comes looking for overflows. Or, well, if I want to touch 
can327 again in six months' time, when I have forgotten about all of this :)


>> +	/* Why this driver can rely on frame->len <= 8:
>> +	 *
>> +	 * While can_dev_dropped_skb() sanity checks the skb to contain a
>> +	 * CAN 2.0, CAN FD, or other CAN frame type supported by the CAN
>> +	 * stack, it does not restrict these types of CAN frames.
>> +	 *
>> +	 * Instead, this driver is guaranteed to receive only classic CAN 2.0
>> +	 * frames, with frame->len <= 8, by a chain of checks around the CAN
>> +	 * device's MTU (as of v6.12):
>> +	 *
>> +	 *  - can_changelink() sets the CAN device's MTU to CAN_MTU since we
>> +	 *    don't advertise CAN_CTRLMODE_FD support in ctrlmode_supported.
>> +	 *  - can_send() then refuses to pass any skb that exceeds CAN_MTU.
>> +	 *  - Since CAN_MTU is the smallest currently (v6.12) supported CAN
>> +	 *    MTU, it is clear that we are dealing with an ETH_P_CAN frame.
>> +	 *  - All ETH_P_CAN (classic CAN 2.0) frames have frame->len <= 8,
>> +	 *    as enforced by a call to can_is_can_skb() in can_send().
>> +	 *  - Thus for all CAN frames reaching this function, frame->len <= 8.
>> +	 */
> 
> Actually, none of this is really specific to your can327 driver.
> 
> While this is a valuable piece of information, I would rather like to
> see this added as a kdoc comment on top of can_dev_dropped_skb(). That
> function already has a one line documentation, but maybe it deserves a
> longer paragraph?

Agreed that the can_dev_dropped_skb() documentation should be improved. 
However, I wouldn't remove the comment in can327 entirely - see above.


> One of the key point is that the userland is able to bypass the CAN_RAW
> layer (or any other CAN layers) by sending AF_PACKET. In which case, the
> packet directly reaches the driver without any prior sanitization. So it
> is critical to highlight that drivers must call can_dev_dropped_skb() at
> the top of their start_xmit() function, typically to avoid buffer
> overflows because of an out of band frame->len.

Agreed, and that's exactly the rabbit hole I went down yesterday. Where 
would be the best place to document this? I guess that could go, er... 
in the CAN driver writer's guide? Do we have something like that?



A bit off-topic, but since CAN_RAW came up: Why does CAN_RAW even 
sanitise anything at all, instead of relying on checks on later layers? 
It seems like quite a few checks are duplicated. And all the while it's 
possible for userspace to do weird stuff like seemingly enabling CAN FD 
on the socket via setsockopt() CAN_RAW_FD_FRAMES, but that's only 
flipping a bit on the CAN_RAW socket, yet changes nothing underneath. It 
was quite confusing to read. I suppose the explanation is "legacy"?



Thanks for your review!

Max