lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANP3RGeENFk0RFD2m1kBuOJxdAhKEjR=9caokkKah35py5kXbg@mail.gmail.com>
Date: Sun, 16 Jun 2024 10:09:07 +0200
From: Maciej Żenczykowski <zenczykowski@...il.com>
To: Maciej Żenczykowski <zenczykowski@...il.com>
Cc: Linux Network Development Mailing List <netdev@...r.kernel.org>, "David S . Miller" <davem@...emloft.net>, 
	Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>
Subject: Re: [PATCH net v2] neighbour: add RTNL_FLAG_DUMP_SPLIT_NLM_DONE to RTM_GETNEIGH

For the other patch, I've tracked down:
  32affa5578f0 ("fib: rules: no longer hold RTNL in fib_nl_dumprule()")
which causes half the regression.

But... I haven't figured out what causes the final half (or third
depending on how you look at it).

I've also spent quite a while trying to figure out what exactly is
going wrong in the python netlink parsing code.
The code leaves a *lot* to be desired...

Turns out it doesn't honour the nlmsghdr.length field of NLMSG_DONE
messages, so it only reads the header (16 bytes) instead of the kernel
generated 20=16+4 NULL bytes.  I'm not sure why those extra 4 bytes
are there, but they are... (anyone know?)
This results in a leftover 4 bytes, which then fail to parse as
another nlmsghdr (because it also effectively ignores that it's a DONE
and continues parsing).
Which explains the failure:
  TypeError: NLMsgHdr requires a bytes object of length 16, got 4

Fixing the parsing, results in things hanging, because we ignore the DONE.

Fixing that... causes more issues (or I'm still confused about how the
rest works, it's hard to follow, complicated by python's lack of types
and some apparently dead code).

Ultimately I think the right answer is to simply fix the horribly
broken netlink parser, which only ever worked by (more-or-less)
chance.  We have plenty of time (months) to fix it in time for the
next release of Android after 15/V, which will be the first one to
support a kernel newer than 6.6 LTS anyway.

Furthermore, the python netlink parser is only used in the test
framework, while the non-test code itself uses C++& java netlink
parsers (that I have not yet looked at) but is likely to either work
or contain entirely different classes of bugs ;-)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ