linux-kernel - Re: can: isotp: epoll breaks isotp

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f39ae7f2-6216-e5b4-1f4d-e01bdee7cf7c@hartkopp.net>
Date:   Sun, 20 Aug 2023 21:51:37 +0200
From:   Oliver Hartkopp <socketcan@...tkopp.net>
To:     Lukas Magel <lukas.magel@...teo.net>,
        Michal Sojka <michal.sojka@...t.cz>,
        Maxime Jayat <maxime.jayat@...ile-devices.fr>,
        Marc Kleine-Budde <mkl@...gutronix.de>
Cc:     linux-can@...r.kernel.org, netdev@...r.kernel.org,
        linux-kernel@...r.kernel.org,
        "Dae R. Jeong" <threeearcat@...il.com>,
        Hillf Danton <hdanton@...a.com>
Subject: Re: can: isotp: epoll breaks isotp_sendmsg

Hi Lukas,

On 18.08.23 13:53, Lukas Magel wrote:
> On 13.08.23 13:23, Lukas Magel wrote:
>> Hi Maxime, hi Michal,
>>
>> On 01.07.23 00:35, Michal Sojka wrote:
>>> Hi Maxime,
>>>
>>> On Fri, Jun 30 2023, Maxime Jayat wrote:
>>>> Hi,
>>>>
>>>> There is something not clear happening with the non-blocking behavior
>>>> of ISO-TP sockets in the TX path, but more importantly, using epoll now
>>>> completely breaks isotp_sendmsg.
>>>> I believe it is related to
>>>> 79e19fa79c ("can: isotp: isotp_ops: fix poll() to not report false
>>>> EPOLLOUT events"),
>>>> but actually is probably deeper than that.
>>>>
>>>> I don't completely understand what is exactly going on, so I am sharing
>>>> the problem I face:
>>>>
>>>> With an ISO-TP socket in non-blocking mode, using epoll seems to make
>>>> isotp_sendmsg always return -EAGAIN.
>>> That's definitely not expected behavior. I tested the patch only with
>>> poll, hoping that epoll would behave the same.
>>>
>>> [...]
>>
>> I am writing to report that we have been witnessing a behavior very similar
>> to what you describe. ISO-TP send breaks with EAGAIN if a poll (for a read)
>> occurs at the same time.
>>
>> Our Python stack uses two threads to do concurrent, blocking reads & writes
>> from and to the ISO-TP socket. The socket has a timeout of 0.1s to facilitate
>> shutdown when requested by the application. Notably, the blocking semantics
>> are handled by CPython, i.e. the underlying kernel socket is non-blocking.
>> CPython polls until the requested operation (read or write) can be executed
>> or the timeout occurs.
>>
>> What happens during execution is that the socket is continuously being
>> polled by the read thread, i.e. so->wait is always filled with one task.
>> This process repeats until the socket receives a frame from the bus and the
>> poll returns successsfully. The app reads the data from the socket and
>> sends a response. Since the send occurs in a different thread, the reader
>> thread will have already returned to its poll loop and to populating
>> so->wait. When the send occurs, isotp_sendmsg checks so->wait for sleepers
>> and returns EAGAIN because the socket is non-blocking although there is no
>> concurrent send operation. This dance continues until the timeout occurs for
>> either the read or the write operation. If the write times out first, a
>> timeout error causes the Python app to break. If the read times out first,
>> there is a race that the write goes through or the reader puts in
>> another poll.
>>
>> This behavior can be seen rather nicely in strace:
>> 110580 poll([{fd=5, events=POLLIN}], 1, 100) = 0 (Timeout)
>> 110580 poll([{fd=5, events=POLLIN}], 1, 100) = 0 (Timeout)
>> 110580 poll([{fd=5, events=POLLIN}], 1, 100) = 1 ([{fd=5, revents=POLLIN}])
>> 110580 recvfrom(5, ">\0", 4095, 0, NULL, NULL) = 2
>> 110580 poll([{fd=5, events=POLLIN}], 1, 100 <unfinished ...>
>> 110569 poll([{fd=5, events=POLLOUT}], 1, 100) = 1 ([{fd=5, revents=POLLOUT}])
>> 110569 sendto(5, "~\0", 2, 0, NULL, 0)  = -1 EAGAIN (Resource temporarily unavailable)
>> 110569 poll([{fd=5, events=POLLOUT}], 1, 100) = 1 ([{fd=5, revents=POLLOUT}])
>> 110569 sendto(5, "~\0", 2, 0, NULL, 0)  = -1 EAGAIN (Resource temporarily unavailable)
>> 110569 poll([{fd=5, events=POLLOUT}], 1, 100) = 1 ([{fd=5, revents=POLLOUT}])
>> 110569 sendto(5, "~\0", 2, 0, NULL, 0)  = -1 EAGAIN (Resource temporarily unavailable)
>> [....]
>> 110569 poll([{fd=5, events=POLLOUT}], 1, 6) = 1 ([{fd=5, revents=POLLOUT}])
>> 110569 sendto(5, "~\0", 2, 0, NULL, 0)  = -1 EAGAIN (Resource temporarily unavailable)
>> 110569 poll([{fd=5, events=POLLOUT}], 1, 6 <unfinished ...>
>> 110580 <... poll resumed>)              = 0 (Timeout)
>> 110569 <... poll resumed>)              = 1 ([{fd=5, revents=POLLOUT}])
>> 110580 poll([{fd=5, events=POLLIN}], 1, 100 <unfinished ...>
>> 110569 sendto(5, "~\0", 2, 0, NULL, 0)  = -1 EAGAIN (Resource temporarily unavailable)
>> 110569 poll([{fd=5, events=POLLOUT}], 1, 5) = 1 ([{fd=5, revents=POLLOUT}])
>> 110569 sendto(5, "~\0", 2, 0, NULL, 0)  = -1 EAGAIN (Resource temporarily unavailable)
>>
>> I believe this is consistent to the behavior you're witnessing with epoll
>> because epoll also places a sleeper in so->wait that is left in the queue
>> until the epoll descriptor is closed.
>>
>>
>>>> By reverting 79e19fa79c, I get better results but still incorrect:
>>> [...]
>>>
>>>> It is then possible to write on the socket but the write is blocking,
>>>> which is not the expected behavior for a non-blocking socket.
>>> Yes, incorrect behavior was why we made the commit in question, however
>>> we saw write() returning -EAGAIN when it shouldn't.
>>>
>>>> I don't know how to solve the problem. To me, using wq_has_sleeper seems
>>>> weird.
>>> Agreed. I've never tried to understand how synchronization works here.
>>> Hopefully, Oliver knows more.
>>>
>>>> The implementation of isotp_poll feels weird too (calling both
>>>> datagram_poll and
>>>> poll_wait?). But I am not sure what would be the correct
>>>> implementation.
>>> I understand it as follows (which might be wrong - someone, please
>>> correct me), isotp_poll() should register the file with all waitqueues
>>> it can wait on. so->wait is one and sock->sq.wait (used by
>>> datagram_poll) is another. The former is definitely used for TX, the
>>> latter is probably used because skb_recv_datagram() is called for RX.
>>> But so->wait is also used for RX and there might proabbly be be some
>>> inconsistency between those.
>>
>> AFAIK, it is correct behavior for isotp_poll to register all wait queues
>> with the poller.
>>
>> Before 79e19fa79c, I assume datagram_poll always returned
>> EPOLLOUT because its corresponding send buffer is unused and empty. This
>> return value can be incorrect if so->tx.state is not IDLE and a send would
>> block. With the patch, this behavior is now suppressed. I believe that
>> the inconsistency could have rather been introduced with:
>> can: isotp: fix race between isotp_sendsmg() and isotp_release() 0517374
>>
>> With this patch, the behavior of isotp_sendmsg was changed to only check
>> so->wait for sleepers instead of consulting so->tx.state to see if the
>> socket is busy. Since the wait queue can also have sleepers only interested
>> in read operations, I believe the return value is not a valid indicator
>> of send readiness. Additionally, in this state, the behavior in isotp_sendmsg
>> is inconsistent with isotp_poll. I will try to test next week if reverting this
>> part of the patch could fix the race condition.
>>
> 
> I submitted a patch to the mailing list and would very much
> appreciate your feedback:
> https://lore.kernel.org/linux-can/20230818114345.142983-1-lukas.magel@posteo.net/
> 
> @Maxime, I tried your isotprecv / isotpsend snippet and the issue did
> not occur with the supplied patch. The send operation went through right
> away.
> 
> @Oliver I adjusted the exit path for the case where the initial wait is
> interrupted to return immediately instead of jumping to err_event_drop.
> Could you please check if you would agree with this change?

The code has really won with your change! Thanks!

But as you already assumed I have a problem with the handling of the 
cleanup when a signal interrupts the wait_event_interruptible() statement.

I think it should still be:

/* wait for complete transmission of current pdu */
err = wait_event_interruptible(so->wait, so->tx.state == ISOTP_IDLE);
if (err)
         goto err_event_drop;

as we need to make sure that the state machine is set to defined values 
and states for the next isotp_sendmsg() attempt.

Best regards,
Oliver