[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAK8fFZ7MKoFSEzMBDAOjoUt+vTZRRQgLDNXEOfdCCXSoXXKE0g@mail.gmail.com>
Date: Thu, 30 May 2024 09:52:38 +0200
From: Jaroslav Pulchart <jaroslav.pulchart@...ddata.com>
To: netdev@...r.kernel.org
Cc: Igor Raits <igor@...ddata.com>, Daniel Secik <daniel.secik@...ddata.com>,
Zdenek Pesek <zdenek.pesek@...ddata.com>
Subject: [regresion] Dell's OMSA Systems Management Data Engine stuck after
update from 6.8.y to 6.9.y (with bisecting)
Hello
We are running Dell's OMSA Systems Management Data Engine on Dell
PowerEdge servers. This service is essential for monitoring and
managing the hardware. Recently, this daemon started getting stuck
after we updated the Linux kernel from version 6.8.y to 6.9.y.
The strace shows it gets stuck on "recvmsg(12, ... ).
# PowerEdge Server with 6.8.y: OK
$ strace -s 256 -fff /opt/dell/srvadmin/sbin/dsm_sa_datamgrd 2>&1 |
grep 'socket(AF_NETLINK, SOCK_DGRAM, NETLINK_ROUTE)' -A4
[pid 1461196] socket(AF_NETLINK, SOCK_DGRAM, NETLINK_ROUTE) = 12
[pid 1461196] sendto(12, [{nlmsg_len=24, nlmsg_type=0x16 /* NLMSG_???
*/, nlmsg_flags=NLM_F_REQUEST|0x300, nlmsg_seq=1, nlmsg_pid=0},
"\x02\x00\x00\x00\x07\x00\x00\x00"], 24, 0, {sa_family=AF_NETLINK,
nl_pid=0, nl_groups=00000000}, 12) = 24
[pid 1461196] recvmsg(12, {msg_name={sa_family=AF_NETLINK, nl_pid=0,
nl_groups=00000000}, msg_namelen=12,
msg_iov=[{iov_base=[[{nlmsg_len=76, nlmsg_type=RTM_NEWADDR,
nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1, nlmsg_pid=1461196},
{ifa_family=AF_INET, ifa_prefixlen=8, ifa_flags=IFA_F_PERMANENT,
ifa_scope=RT_SCOPE_HOST, ifa_index=if_nametoindex("lo")},
[[{nla_len=8, nla_type=IFA_ADDRESS}, inet_addr("127.0.0.1")],
[{nla_len=8, nla_type=IFA_LOCAL}, inet_addr("127.0.0.1")],
[{nla_len=7, nla_type=IFA_LABEL}, "lo"], [{nla_len=8,
nla_type=IFA_FLAGS}, IFA_F_PERMANENT], [{nla_len=20,
nla_type=IFA_CACHEINFO}, {ifa_prefered=4294967295,
ifa_valid=4294967295, cstamp=754, tstamp=754}]]], [{nlmsg_len=92,
nlmsg_type=RTM_NEWADDR, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1,
nlmsg_pid=1461196}, {ifa_family=AF_INET, ifa_prefixlen=20,
ifa_flags=IFA_F_PERMANENT, ifa_scope=RT_SCOPE_UNIVERSE,
ifa_index=if_nametoindex("br_private")}, [[{nla_len=8,
nla_type=IFA_ADDRESS}, inet_addr("10.12.48.106")], [{nla_len=8,
nla_type=IFA_LOCAL}, inet_addr("10.12.48.106")], [{nla_len=8,
nla_type=IFA_BROADCAST}, inet_addr("10.12.63.255")], [{nla_len=15,
nla_type=IFA_LABEL}, "br_private"], [{nla_len=8, nla_type=IFA_FLAGS},
IFA_F_PERMANENT|IFA_F_NOPREFIXROUTE], [{nla_len=20,
nla_type=IFA_CACHEINFO}, {ifa_prefered=4294967295,
ifa_valid=4294967295, cstamp=1752, tstamp=1752}]]], [{nlmsg_len=92,
nlmsg_type=RTM_NEWADDR, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1,
nlmsg_pid=1461196}, {ifa_family=AF_INET, ifa_prefixlen=20,
ifa_flags=IFA_F_PERMANENT, ifa_scope=RT_SCOPE_UNIVERSE,
ifa_index=if_nametoindex("br_public")}, [[{nla_len=8,
nla_type=IFA_ADDRESS}, inet_addr("10.12.16.106")], [{nla_len=8,
nla_type=IFA_LOCAL}, inet_addr("10.12.16.106")], [{nla_len=8,
nla_type=IFA_BROADCAST}, inet_addr("10.12.31.255")], [{nla_len=14,
nla_type=IFA_LABEL}, "br_public"], [{nla_len=8, nla_type=IFA_FLAGS},
IFA_F_PERMANENT|IFA_F_NOPREFIXROUTE], [{nla_len=20,
nla_type=IFA_CACHEINFO}, {ifa_prefered=4294967295,
ifa_valid=4294967295, cstamp=1756, tstamp=1756}]]]], iov_len=21504}],
msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 260
[pid 1461196] recvmsg(12, {msg_name={sa_family=AF_NETLINK, nl_pid=0,
nl_groups=00000000}, msg_namelen=12,
msg_iov=[{iov_base=[{nlmsg_len=20, nlmsg_type=NLMSG_DONE,
nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1, nlmsg_pid=1461196}, 0],
iov_len=21244}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 20
[pid 1461196] close(12) = 0
# PowerEdge Server with 6.8.y: STUCK
$ strace -s 256 -fff /opt/dell/srvadmin/sbin/dsm_sa_datamgrd 2>&1 |
grep 'socket(AF_NETLINK, SOCK_DGRAM, NETLINK_ROUTE)' -A4
[pid 3249936] socket(AF_NETLINK, SOCK_DGRAM, NETLINK_ROUTE) = 12
[pid 3249936] sendto(12, [{nlmsg_len=24, nlmsg_type=0x16 /* NLMSG_???
*/, nlmsg_flags=NLM_F_REQUEST|0x300, nlmsg_seq=1, nlmsg_pid=0},
"\x02\x00\x00\x00\x02\x00\x00\x00"], 24, 0, {sa_family=AF_NETLINK,
nl_pid=0, nl_groups=00000000}, 12) = 24
[pid 3249936] recvmsg(12, {msg_name={sa_family=AF_NETLINK, nl_pid=0,
nl_groups=00000000}, msg_namelen=12,
msg_iov=[{iov_base=[[{nlmsg_len=76, nlmsg_type=RTM_NEWADDR,
nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1, nlmsg_pid=3249936},
{ifa_family=AF_INET, ifa_prefixlen=8, ifa_flags=IFA_F_PERMANENT,
ifa_scope=RT_SCOPE_HOST, ifa_index=if_nametoindex("lo")},
[[{nla_len=8, nla_type=IFA_ADDRESS}, inet_addr("127.0.0.1")],
[{nla_len=8, nla_type=IFA_LOCAL}, inet_addr("127.0.0.1")],
[{nla_len=7, nla_type=IFA_LABEL}, "lo"], [{nla_len=8,
nla_type=IFA_FLAGS}, IFA_F_PERMANENT], [{nla_len=20,
nla_type=IFA_CACHEINFO}, {ifa_prefered=4294967295,
ifa_valid=4294967295, cstamp=769, tstamp=769}]]], [{nlmsg_len=88,
nlmsg_type=RTM_NEWADDR, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1,
nlmsg_pid=3249936}, {ifa_family=AF_INET, ifa_prefixlen=24,
ifa_flags=0, ifa_scope=RT_SCOPE_LINK,
ifa_index=if_nametoindex("idrac")}, [[{nla_len=8,
nla_type=IFA_ADDRESS}, inet_addr("169.254.1.2")], [{nla_len=8,
nla_type=IFA_LOCAL}, inet_addr("169.254.1.2")], [{nla_len=8,
nla_type=IFA_BROADCAST}, inet_addr("169.254.1.255")], [{nla_len=10,
nla_type=IFA_LABEL}, "idrac"], [{nla_len=8, nla_type=IFA_FLAGS}, 0],
[{nla_len=20, nla_type=IFA_CACHEINFO}, {ifa_prefered=714429,
ifa_valid=714429, cstamp=1805, tstamp=1805}]]], [{nlmsg_len=92,
nlmsg_type=RTM_NEWADDR, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1,
nlmsg_pid=3249936}, {ifa_family=AF_INET, ifa_prefixlen=20,
ifa_flags=IFA_F_PERMANENT, ifa_scope=RT_SCOPE_UNIVERSE,
ifa_index=if_nametoindex("br_private")}, [[{nla_len=8,
nla_type=IFA_ADDRESS}, inet_addr("10.12.48.105")], [{nla_len=8,
nla_type=IFA_LOCAL}, inet_addr("10.12.48.105")], [{nla_len=8,
nla_type=IFA_BROADCAST}, inet_addr("10.12.63.255")], [{nla_len=15,
nla_type=IFA_LABEL}, "br_private"], [{nla_len=8, nla_type=IFA_FLAGS},
IFA_F_PERMANENT], [{nla_len=20, nla_type=IFA_CACHEINFO},
{ifa_prefered=4294967295, ifa_valid=4294967295, cstamp=1791,
tstamp=1791}]]], [{nlmsg_len=92, nlmsg_type=RTM_NEWADDR,
nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1, nlmsg_pid=3249936},
{ifa_family=AF_INET, ifa_prefixlen=20, ifa_flags=IFA_F_PERMANENT,
ifa_scope=RT_SCOPE_UNIVERSE, ifa_index=if_nametoindex("br_public")},
[[{nla_len=8, nla_type=IFA_ADDRESS}, inet_addr("10.12.16.105")],
[{nla_len=8, nla_type=IFA_LOCAL}, inet_addr("10.12.16.105")],
[{nla_len=8, nla_type=IFA_BROADCAST}, inet_addr("10.12.31.255")],
[{nla_len=14, nla_type=IFA_LABEL}, "br_public"], [{nla_len=8,
nla_type=IFA_FLAGS}, IFA_F_PERMANENT], [{nla_len=20,
nla_type=IFA_CACHEINFO}, {ifa_prefered=4294967295,
ifa_valid=4294967295, cstamp=1795, tstamp=1795}]]], [{nlmsg_len=20,
nlmsg_type=NLMSG_DONE, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1,
nlmsg_pid=3249936}, 0]], iov_len=13312}], msg_iovlen=1,
msg_controllen=0, msg_flags=0}, 0) = 368
.. STUCK ...
I run two bisecting
1. git bisect start '--first-parent' 'v6.9' 'v6.8'
2. git bisect start
# ----------------- 1. bisecting -----------------
# bad: [a38297e3fb012ddfa7ce0321a7e5a8daeb1872b6] Linux 6.9
# good: [e8f897f4afef0031fe618a8e94127a0934896aba] Linux 6.8
git bisect start '--first-parent' 'v6.9' 'v6.8'
# bad: [033e4491b6c614efddcf58927082887e2b78995d] Merge tag
'gpio-fixes-for-v6.9-rc2' of
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
git bisect bad 033e4491b6c614efddcf58927082887e2b78995d
# bad: [68bf6bfdcf56b5e6567a668ffc15d5e449356c02] Merge tag
'ext4_for_linus-6.9-rc1' of
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
git bisect bad 68bf6bfdcf56b5e6567a668ffc15d5e449356c02
# good: [b32273ee89a866b01b316b9a8de407efde01090c] Merge tag
'execve-v6.9-rc1' of
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
git bisect good b32273ee89a866b01b316b9a8de407efde01090c
# bad: [9687d4ac582fad1af9979e296881f28c3f35b05c] Merge tag
'mailbox-v6.9' of
git://git.kernel.org/pub/scm/linux/kernel/git/jassibrar/mailbox
git bisect bad 9687d4ac582fad1af9979e296881f28c3f35b05c
# bad: [d2bac0823d046117de295120edff3d860dc6554b] Merge tag
'for-6.9/dm-changes' of
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
git bisect bad d2bac0823d046117de295120edff3d860dc6554b
# bad: [ca661c5e1d89a65642d7de5ad3edc00b5666002a] Merge tag
'selinux-pr-20240312' of
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
git bisect bad ca661c5e1d89a65642d7de5ad3edc00b5666002a
# good: [681ba318a635787031537b3a7df5c12980835cb1] Merge tag
'Smack-for-6.9' of https://github.com/cschaufler/smack-next
git bisect good 681ba318a635787031537b3a7df5c12980835cb1
# good: [1f440397665f4241346e4cc6d93f8b73880815d1] Merge tag
'docs-6.9' of git://git.lwn.net/linux
git bisect good 1f440397665f4241346e4cc6d93f8b73880815d1
# bad: [9187210eee7d87eea37b45ea93454a88681894a4] Merge tag
'net-next-6.9' of
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
git bisect bad 9187210eee7d87eea37b45ea93454a88681894a4
# first bad commit: [9187210eee7d87eea37b45ea93454a88681894a4] Merge
tag 'net-next-6.9' of
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
# ----------------- 2. bisecting -----------------
git bisect start
# status: waiting for both good and bad commits
# bad: [a38297e3fb012ddfa7ce0321a7e5a8daeb1872b6] Linux 6.9
git bisect bad a38297e3fb012ddfa7ce0321a7e5a8daeb1872b6
# status: waiting for good commit(s), bad commit known
# good: [e8f897f4afef0031fe618a8e94127a0934896aba] Linux 6.8
git bisect good e8f897f4afef0031fe618a8e94127a0934896aba
# bad: [480e035fc4c714fb5536e64ab9db04fedc89e910] Merge tag
'drm-next-2024-03-13' of https://gitlab.freedesktop.org/drm/kernel
git bisect bad 480e035fc4c714fb5536e64ab9db04fedc89e910
# bad: [9187210eee7d87eea37b45ea93454a88681894a4] Merge tag
'net-next-6.9' of
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
git bisect bad 9187210eee7d87eea37b45ea93454a88681894a4
# good: [a01c9fe32378636ae65bec8047b5de3fdb2ba5c8] Merge tag
'nfsd-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
git bisect good a01c9fe32378636ae65bec8047b5de3fdb2ba5c8
# bad: [ca61ba3885274a684c83d8a538eb77b30e38ee92] Merge branch
'rework-genet-mdioclocking'
git bisect bad ca61ba3885274a684c83d8a538eb77b30e38ee92
# good: [f42822f22b1c5f72c7e3497d9683f379ab0c5fe4] bnxt_en: Use
firmware provided maximum filter counts.
git bisect good f42822f22b1c5f72c7e3497d9683f379ab0c5fe4
# good: [e10cd2ddd89e8b3e61b49247067e79f7debec2f1] wifi: rtw89: load
BB parameters to PHY-1
git bisect good e10cd2ddd89e8b3e61b49247067e79f7debec2f1
# bad: [81800aef0eba33df2b30f2e29a0137078b9ba256] net: mdio_bus: make
mdio_bus_type const
git bisect bad 81800aef0eba33df2b30f2e29a0137078b9ba256
# good: [bed90b06b6812d9c8c848414b090ddf38f0e6cc1] net: phy: aquantia:
clear PMD Global Transmit Disable bit during init
git bisect good bed90b06b6812d9c8c848414b090ddf38f0e6cc1
# bad: [e1a00373e1305578cd09526aa056940409e6b877] Merge tag
'linux-can-next-for-6.9-20240213' of
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
git bisect bad e1a00373e1305578cd09526aa056940409e6b877
# good: [383de5664c87abe097d6369d18305c3a6e559bb2] can: softing:
remove redundant NULL check
git bisect good 383de5664c87abe097d6369d18305c3a6e559bb2
# bad: [2ce30993831041b9dcd31eb12896be6611e8b7e2] r8169: add generic
rtl_set_eee_txidle_timer function
git bisect bad 2ce30993831041b9dcd31eb12896be6611e8b7e2
# bad: [0bef512012b1cd8820f0c9ec80e5f8ceb43fdd59] net: add
netdev_lockdep_set_classes() to virtual drivers
git bisect bad 0bef512012b1cd8820f0c9ec80e5f8ceb43fdd59
# bad: [88c9d07b96bb02108ef786f574cd0e730ebab678] Merge branch
'net-use-net-dev_by_index-in-two-places'
git bisect bad 88c9d07b96bb02108ef786f574cd0e730ebab678
# bad: [3e41af90767dcf8e5ca91cfbbbcb772584940df9] rtnetlink: use
xarray iterator to implement rtnl_dump_ifinfo()
git bisect bad 3e41af90767dcf8e5ca91cfbbbcb772584940df9
# good: [f383ced24d6ae6c1989394d052d3109b9d645f11] vlan: use xarray
iterator to implement /proc/net/vlan/config
git bisect good f383ced24d6ae6c1989394d052d3109b9d645f11
# first bad commit: [3e41af90767dcf8e5ca91cfbbbcb772584940df9]
rtnetlink: use xarray iterator to implement rtnl_dump_ifinfo()
However, reverting just the "use xarray iterator to implement
rtnl_dump_ifinfo" change did not resolve the issue. Do you have any
suggestions on what to try next and how to fix it?
Best,
Jaroslav Pulchart
Powered by blists - more mailing lists