lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <11328958-453f-447f-9af8-3b5824dfb041@munic.io>
Date: Fri, 30 Jun 2023 18:44:14 +0200
From: Maxime Jayat <maxime.jayat@...ile-devices.fr>
To: Oliver Hartkopp <socketcan@...tkopp.net>,
 Marc Kleine-Budde <mkl@...gutronix.de>, Michal Sojka <michal.sojka@...t.cz>
Cc: linux-can@...r.kernel.org, netdev@...r.kernel.org,
 linux-kernel@...r.kernel.org, "Dae R. Jeong" <threeearcat@...il.com>,
 Hillf Danton <hdanton@...a.com>
Subject: can: isotp: epoll breaks isotp_sendmsg

Hi,

There is something not clear happening with the non-blocking behavior
of ISO-TP sockets in the TX path, but more importantly, using epoll now
completely breaks isotp_sendmsg.
I believe it is related to
79e19fa79c ("can: isotp: isotp_ops: fix poll() to not report false 
EPOLLOUT events"),
but actually is probably deeper than that.

I don't completely understand what is exactly going on, so I am sharing
the problem I face:

With an ISO-TP socket in non-blocking mode, using epoll seems to make
isotp_sendmsg always return -EAGAIN.

I have a non-blocking socket + epoll version of can-utils isotpsend 
available for
testing at https://gist.github.com/MJayat/4857da43ab154e4ba644d2446b5fa46d
With this version I do the following test:

isotprecv -l -m 0x80 -s 7e8 -d 7e0 vcan0 &
echo "01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f" | strace ./isotpsend 
-l10 -s 7E0 -d 7E8 vcan0

I get:
...
15:37:02.456849 epoll_ctl(4, EPOLL_CTL_ADD, 3, 
{events=EPOLLIN|EPOLLOUT|EPOLLRDHUP, data={u32=0, u64=0}}) = 0 <0.000249>
15:37:02.457839 epoll_wait(4, [{events=EPOLLOUT, data={u32=0, u64=0}}], 
1, 2000) = 1 <0.000200>
15:37:02.458838 write(3, 
"\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f", 15) = -1 
EAGAIN (Resource temporarily unavailable) <0.000278>
15:37:02.459908 epoll_wait(4, [{events=EPOLLOUT, data={u32=0, u64=0}}], 
1, 2000) = 1 <0.000250>
15:37:02.460879 write(3, 
"\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f", 15) = -1 
EAGAIN (Resource temporarily unavailable) <0.000272>
15:37:02.461831 epoll_wait(4, [{events=EPOLLOUT, data={u32=0, u64=0}}], 
1, 2000) = 1 <0.000199>
...
impossible to write on the socket, and busy-looping.

With a change to epoll_ctl flags, now including EPOLLET, I get:
...
15:36:22.443689 epoll_ctl(4, EPOLL_CTL_ADD, 3, 
{events=EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, data={u32=0, u64=0}}) = 0 
<0.000171>
15:36:22.444514 epoll_wait(4, [{events=EPOLLOUT, data={u32=0, u64=0}}], 
1, 2000) = 1 <0.000188>
15:36:22.445413 write(3, 
"\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f", 15) = -1 
EAGAIN (Resource temporarily unavailable) <0.000175>
15:36:22.446335 epoll_wait(4, [], 1, 2000) = 0 <2.026006>
...
epoll_wait now blocks indefinitely.

By reverting 79e19fa79c, I get better results but still incorrect:
...
15:41:43.890880 epoll_ctl(4, EPOLL_CTL_ADD, 3, 
{events=EPOLLIN|EPOLLOUT|EPOLLRDHUP, data={u32=0, u64=0}}) = 0 <0.000200>
15:41:43.892011 epoll_wait(4, [{events=EPOLLOUT, data={u32=0, u64=0}}], 
1, 2000) = 1 <0.000408>
15:41:43.893148 write(3, 
"\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f", 15) = 15 
<0.000458>
15:41:43.894405 epoll_wait(4, [{events=EPOLLOUT, data={u32=0, u64=0}}], 
1, 2000) = 1 <0.000257>
15:41:43.895385 write(3, 
"\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f", 15) = 15 
<0.128429>
15:41:44.028757 epoll_wait(4, [{events=EPOLLOUT, data={u32=0, u64=0}}], 
1, 2000) = 1 <0.001886>
15:41:44.040858 write(3, 
"\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f", 15) = 15 
<0.108069>
...
It is then possible to write on the socket but the write is blocking, 
which is
not the expected behavior for a non-blocking socket.

I don't know how to solve the problem. To me, using wq_has_sleeper seems 
weird.
The implementation of isotp_poll feels weird too (calling both 
datagram_poll and
poll_wait?). But I am not sure what would be the correct implementation.

Note that this behavior is currently on all linux-stable branches. I am
currently testing on v6.1.36 but I know it is failing on v6.3 too.

My actual use-case is in Async Rust using tokio. Not using epoll is not an
option and a non-blocking socket that sometimes blocks can be terrible for
the performance of the whole application. Our embedded target runs with the
out-of-tree module on an older linux version, but my colleagues with 
up-to-date
linux stable kernels have recently been prevented from running tests on 
their
PC.

Is there someone with a good idea of how to proceed to fix this?
I'll probably keep digging but I don't think I can spend so much time on 
this,
so any help is appreciated.

Thanks,
Maxime



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ