netdev - Re: [PATCH RESEND] can: j1939: do not wait 250ms if the same addr was already claimed

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e0f6b26e2c724439752f3c13b53af1a56a42a5bf.camel@egluetechnologies.com>
Date:   Thu, 17 Nov 2022 15:08:20 +0100
From:   Devid Antonio Filoni <devid.filoni@...uetechnologies.com>
To:     David Jander <david@...tonic.nl>
Cc:     Oleksij Rempel <o.rempel@...gutronix.de>,
        Kurt Van Dijck <dev.kurt@...dijck-laurijssen.be>,
        Robin van der Gracht <robin@...tonic.nl>,
        kernel@...gutronix.de, linux-can@...r.kernel.org,
        Oleksij Rempel <linux@...pel-privat.de>,
        Oliver Hartkopp <socketcan@...tkopp.net>,
        Marc Kleine-Budde <mkl@...gutronix.de>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>,
        Maxime Jayat <maxime.jayat@...ile-devices.fr>,
        kbuild test robot <lkp@...el.com>, netdev@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH RESEND] can: j1939: do not wait 250ms if the same addr
 was already claimed

On Fri, 2022-05-13 at 11:46 +0200, Devid Antonio Filoni wrote:
> Hi David,
> 
> On Wed, 2022-05-11 at 16:22 +0200, David Jander wrote:
> > Hi Devid,
> > 
> > On Wed, 11 May 2022 14:55:04 +0200
> > Devid Antonio Filoni <
> > devid.filoni@...uetechnologies.com
> > > wrote:
> > 
> > > On Wed, 2022-05-11 at 11:06 +0200, David Jander wrote:
> > > > Hi,
> > > > 
> > > > On Wed, 11 May 2022 10:47:28 +0200
> > > > Oleksij Rempel <
> > > > o.rempel@...gutronix.de
> > > >   
> > > > > wrote:  
> > > > 
> > > >   
> > > > > Hi,
> > > > > 
> > > > > i'll CC more J1939 users to the discussion.  
> > > > 
> > > > Thanks for the CC.
> > > >   
> > > > > On Tue, May 10, 2022 at 01:00:41PM +0200, Devid Antonio Filoni wrote:  
> > > > > > Hi,
> > > > > > 
> > > > > > On Tue, 2022-05-10 at 06:26 +0200, Oleksij Rempel wrote:    
> > > > > > > Hi,
> > > > > > > 
> > > > > > > On Mon, May 09, 2022 at 09:04:06PM +0200, Kurt Van Dijck wrote:    
> > > > > > > > On ma, 09 mei 2022 19:03:03 +0200, Devid Antonio Filoni wrote:    
> > > > > > > > > This is not explicitly stated in SAE J1939-21 and some tools used for
> > > > > > > > > ISO-11783 certification do not expect this wait.    
> > > > > > > 
> > > > > > > It will be interesting to know which certification tool do not expect it and
> > > > > > > what explanation is used if it fails?
> > > > > > >     
> > > > > > > > IMHO, the current behaviour is not explicitely stated, but nor is the opposite.
> > > > > > > > And if I'm not mistaken, this introduces a 250msec delay.
> > > > > > > > 
> > > > > > > > 1. If you want to avoid the 250msec gap, you should avoid to contest the same address.
> > > > > > > > 
> > > > > > > > 2. It's a balance between predictability and flexibility, but if you try to accomplish both,
> > > > > > > > as your patch suggests, there is slight time-window until the current owner responds,
> > > > > > > > in which it may be confusing which node has the address. It depends on how much history
> > > > > > > > you have collected on the bus.
> > > > > > > > 
> > > > > > > > I'm sure that this problem decreases with increasing processing power on the nodes,
> > > > > > > > but bigger internal queues also increase this window.
> > > > > > > > 
> > > > > > > > It would certainly help if you describe how the current implementation fails.
> > > > > > > > 
> > > > > > > > Would decreasing the dead time to 50msec help in such case.
> > > > > > > > 
> > > > > > > > Kind regards,
> > > > > > > > Kurt
> > > > > > > >     
> > > > > > > 
> > > > > > >     
> > > > > > 
> > > > > > The test that is being executed during the ISOBUS compliance is the
> > > > > > following: after an address has been claimed by a CF (#1), another CF
> > > > > > (#2) sends a  message (other than address-claim) using the same address
> > > > > > claimed by CF #1.
> > > > > > 
> > > > > > As per ISO11783-5 standard, if a CF receives a message, other than the
> > > > > > address-claimed message, which uses the CF's own SA, then the CF (#1):
> > > > > > - shall send the address-claim message to the Global address;
> > > > > > - shall activate a diagnostic trouble code with SPN = 2000+SA and FMI =
> > > > > > 31
> > > > > > 
> > > > > > After the address-claim message is sent by CF #1, as per ISO11783-5
> > > > > > standard:
> > > > > > - If the name of the CF #1 has a lower priority then the one of the CF
> > > > > > #2, the the CF #2 shall send its address-claim message and thus the CF
> > > > > > #1 shall send the cannot-claim-address message or shall execute again
> > > > > > the claim procedure with a new address
> > > > > > - If the name of the CF #1 has higher priority then the of the CF #2,
> > > > > > then the CF #2 shall send the cannot-claim-address message or shall
> > > > > > execute the claim procedure with a new address
> > > > > > 
> > > > > > Above conflict management is OK with current J1939 driver
> > > > > > implementation, however, since the driver always waits 250ms after
> > > > > > sending an address-claim message, the CF #1 cannot set the DTC. The DM1
> > > > > > message which is expected to be sent each second (as per J1939-73
> > > > > > standard) may not be sent.
> > > > > > 
> > > > > > Honestly, I don't know which company is doing the ISOBUS compliance
> > > > > > tests on our products and which tool they use as it was choosen by our
> > > > > > customer, however they did send us some CAN traces of previously
> > > > > > performed tests and we noticed that the DM1 message is sent 160ms after
> > > > > > the address-claim message (but it may also be lower then that), and this
> > > > > > is something that we cannot do because the driver blocks the application
> > > > > > from sending it.
> > > > > > 
> > > > > > 28401.127146 1  18E6FFF0x    Tx   d 8 FE 26 FF FF FF FF FF FF  //Message
> > > > > > with other CF's address
> > > > > > 28401.167414 1  18EEFFF0x    Rx   d 8 15 76 D1 0B 00 86 00 A0  //Address
> > > > > > Claim - SA = F0
> > > > > > 28401.349214 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 01 FF FF  //DM1
> > > > > > 28402.155774 1  18E6FFF0x    Tx   d 8 FE 26 FF FF FF FF FF FF  //Message
> > > > > > with other CF's address
> > > > > > 28402.169455 1  18EEFFF0x    Rx   d 8 15 76 D1 0B 00 86 00 A0  //Address
> > > > > > Claim - SA = F0
> > > > > > 28402.348226 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 02 FF FF  //DM1
> > > > > > 28403.182753 1  18E6FFF0x    Tx   d 8 FE 26 FF FF FF FF FF FF  //Message
> > > > > > with other CF's address
> > > > > > 28403.188648 1  18EEFFF0x    Rx   d 8 15 76 D1 0B 00 86 00 A0  //Address
> > > > > > Claim - SA = F0
> > > > > > 28403.349328 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 03 FF FF  //DM1
> > > > > > 28404.349406 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 03 FF FF  //DM1
> > > > > > 28405.349740 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 03 FF FF  //DM1
> > > > > > 
> > > > > > Since the 250ms wait is not explicitly stated, IMHO it should be up to
> > > > > > the user-space implementation to decide how to manage it.  
> > > > 
> > > > I think this is not entirely correct. AFAICS the 250ms wait is indeed
> > > > explicitly stated.
> > > > The following is taken from ISO 11783-5:
> > > > 
> > > > In "4.4.4.3 Address violation" it states that "If a CF receives a message,
> > > > other than the address-claimed message, which uses the CF’s own SA, then the
> > > > CF [...] shall send the address-claim message to the Global address."
> > > > 
> > > > So the CF shall claim its address again. But further down, in "4.5.2 Address
> > > > claim requirements" it is stated that "...No CF shall begin, or resume,
> > > > transmission on the network until 250 ms after it has successfully claimed an
> > > > address".
> > > > 
> > > > At this moment, the address is in dispute. The affected CFs are not allowed to
> > > > send any other messages until this dispute is resolved, and the standard
> > > > requires a waiting time of 250ms which is minimally deemed necessary to give
> > > > all participants time to respond and eventually dispute the address claim.
> > > > 
> > > > If the offending CF ignores this dispute and keeps sending incorrect messages
> > > > faster than every 250ms, then effectively the other CF has no chance to ever
> > > > resume normal operation because its address is still disputed.
> > > > 
> > > > According to 4.4.4.3 it is also required to set a DTC, but it will not be
> > > > allowed to send the DM1 message unless the address dispute is resolved.
> > > > 
> > > > This effectively leads to the offending CF to DoS the affected CF if it keeps
> > > > sending offending messages. Unfortunately neither J1939 nor ISObus takes into
> > > > account adversarial behavior on the CAN network, so we cannot do anything
> > > > about this.
> > > > 
> > > > As for the ISObus compliance tool that is mentioned by Devid, IMHO this
> > > > compliance tool should be challenged and fixed, since it is broken.
> > > > 
> > > > The networking layer is prohibiting the DM1 message to be sent, and the
> > > > networking layer has precedence above all superior protocol layers, so the
> > > > diagnostics layer is not able to operate at this moment.
> > > > 
> > > > Best regards,
> > > > 
> > > >   
> > > 
> > > Hi David,
> > > 
> > > I get your point but I'm not sure that it is the correct interpretation
> > > that should be applied in this particular case for the following
> > > reasons:
> > > 
> > > - In "4.5.2 Address claim requirements" it is explicitly stated that
> > > "The CF shall claim its own address when initializing and when
> > > responding to a command to change its NAME or address" and this seems to
> > 
> > The standard unfortunately has a track record of ignoring a lot of scenarios
> > and corner cases, like in this instance the fact that there can appear new
> > participants on the bus _after_ initialization has long finished, and it would
> > need to claim its address again in that case.
> > 
> > But look at point d) of that same section: "No CF shall begin, or resume,
> > transmission on the network until 250 ms after it has successfully claimed an
> > address (Figure 4). This does not apply when responding to a request for
> > address claimed."
> > 
> > So we basically have two situations when this will apply after the network is
> > up and running and a new node suddenly appears:
> > 
> >  1. The new node starts with a "Request for address claimed" message, to
> >  which your CF should respond with an "Address Claimed" message and NOT wait
> >  250ms.
> > 
> > or
> > 
> >  2. The new node creates an addressing conflict either by claiming its address
> >  without first sending a "request for address claimed" message or (and this is
> >  your case) simply using its address without claiming it first.
> > 
> > It is this second possibility where there is a conflict that must be resolved,
> > and then you must wait 250ms after claiming the conflicting address for
> > yourself.
> > 
> > > completely ignore the "4.4.4.3 Address violation" that states that the
> > > address-claimed message shall be sent also when "the CF receives a
> > > message, other than the address-claimed message, which uses the CF's own
> > > SA".
> > > Please note that the address was already claimed by the CF, so I think
> > > that the initialization requirements should not apply in this case since
> > > all disputes were already resolved.
> > 
> > Well, yes and no. The address was claimed before, yes, but then a new node came
> > onto the bus and disputed that address. In that case the dispute needs to be
> > resolved first. Imagine you would NOT wait 250ms, but the other CF did
> > correctly claim its address, but it was you who did not receive that message
> > for some reason. Now also assume that your own NAME has a lower priority than
> > the other CF. In this case you can send a "claimed address" message to claim
> > your address again, but it will be contested. If you don't wait for the
> > contestant, it is you who will be in violation of the protocol, because you
> > should have changed your own address but failed to do so.
> > 
> > > - If the offending CF ignores the dispute, as you said, then the other
> > > CF has no chance to ever resume normal operation and so the network
> > > cannot be aware that the other CF is not working correctly because the
> > > offending CF is spoofing its own address.
> > 
> > Correct. And like I said in my previous reply, this is unfortunately how CAN,
> > J1939 and ISObus work. The whole network must cooperate and there is no
> > consideration for malign or adversarial actors.
> > There are also a lot of possible corner cases that these standards
> > unfortunately do not take into account. Conformance test tools seem to be even
> > more problematic and tend to have bugs quite often. I am still inclined to
> > think this is the case with your test tool.
> > 
> > > This seems to make useless the
> > > requirement that states to activate the DTC in "4.4.4.3 Address
> > > violation".
> > 
> > The requirement is not useless. You can still set and store the DTC, just not
> > broadcast it to the network at that moment.
> > 
> > Best regards,
> > 
> > 
> 
> Thank you for your feedback and explanation.
> I asked the customer to contact the compliance company so that we can
> verify with them this particular use-case. I want to understand if there
> is an application note or exception that states how to manage it or if
> they implemented the test basing it on their own interpretation and how
> it really works: supposing that the test does not check the DM1
> presence, then the test could be passed even without sending the DM1
> message during the 250ms after the adress-claimed message.
> 
> Best regards,
> Devid

Hi David, all,

I'm sorry for resuming this discussion after a long time but I noticed
that the driver forces the 250 ms wait even when responding to a request
for address-claimed which is against point d) of ISO 11783-5 "4.5.2
Address claim requirements":

No CF shall begin, or resume, transmission on the network until 250 ms
after it has successfully claimed  an  address  (see Figure 4), except
when responding to a request for address-claimed.

IMHO the driver shall be able to detect above condition or shall not
force the 250 ms wait which should then be implemented, depending on the
case, on user-space application side.

Thank you, best regards,
Devid