lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 8 Feb 2022 14:53:07 +0000
From:   Vladimir Oltean <vladimir.oltean@....com>
To:     Colin Foster <colin.foster@...advantage.com>
CC:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        Jakub Kicinski <kuba@...nel.org>,
        "David S. Miller" <davem@...emloft.net>,
        "UNGLinuxDriver@...rochip.com" <UNGLinuxDriver@...rochip.com>,
        Alexandre Belloni <alexandre.belloni@...tlin.com>,
        Claudiu Manoil <claudiu.manoil@....com>
Subject: Re: [PATCH v5 net-next 0/3] use bulk reads for ocelot statistics

On Tue, Feb 08, 2022 at 05:55:32AM -0800, Colin Foster wrote:
> On Tue, Feb 08, 2022 at 01:30:16PM +0000, Vladimir Oltean wrote:
> > On Mon, Feb 07, 2022 at 08:46:41PM -0800, Colin Foster wrote:
> > > Ocelot loops over memory regions to gather stats on different ports.
> > > These regions are mostly continuous, and are ordered. This patch set
> > > uses that information to break the stats reads into regions that can get
> > > read in bulk.
> > >
> > > The motiviation is for general cleanup, but also for SPI. Performing two
> > > back-to-back reads on a SPI bus require toggling the CS line, holding,
> > > re-toggling the CS line, sending 3 address bytes, sending N padding
> > > bytes, then actually performing the read. Bulk reads could reduce almost
> > > all of that overhead, but require that the reads are performed via
> > > regmap_bulk_read.
> > >
> > > v1 > v2: reword commit messages
> > > v2 > v3: correctly mark this for net-next when sending
> > > v3 > v4: calloc array instead of zalloc per review
> > > v4 > v5:
> > >     Apply CR suggestions for whitespace
> > >     Fix calloc / zalloc mixup
> > >     Properly destroy workqueues
> > >     Add third commit to split long macros
> > >
> > >
> > > Colin Foster (3):
> > >   net: ocelot: align macros for consistency
> > >   net: mscc: ocelot: add ability to perform bulk reads
> > >   net: mscc: ocelot: use bulk reads for stats
> > >
> > >  drivers/net/ethernet/mscc/ocelot.c    | 78 ++++++++++++++++++++++-----
> > >  drivers/net/ethernet/mscc/ocelot_io.c | 13 +++++
> > >  include/soc/mscc/ocelot.h             | 57 ++++++++++++++------
> > >  3 files changed, 120 insertions(+), 28 deletions(-)
> > >
> > > --
> > > 2.25.1
> > >
> > 
> > Please do not merge these yet. I gave them a run on my board and the
> > kernel crashed on boot.
> > 
> > [    8.043170] mscc_felix 0000:00:00.5: Found PCS at internal MDIO address 0
> > [    8.050241] mscc_felix 0000:00:00.5: Found PCS at internal MDIO address 1
> > [    8.057142] mscc_felix 0000:00:00.5: Found PCS at internal MDIO address 2
> > [    8.064021] mscc_felix 0000:00:00.5: Found PCS at internal MDIO address 3
> > [    8.128668] ------------[ cut here ]------------
> > [    8.133315] WARNING: CPU: 1 PID: 44 at drivers/net/dsa/ocelot/felix_vsc9959.c:1007 vsc9959_wm_enc+0x2c/0x40
> > [    8.143107] Modules linked in:
> > [    8.146181] CPU: 1 PID: 44 Comm: kworker/u4:2 Not tainted 5.17.0-rc2-07010-ga9b9500ffaac-dirty #2104
> > [    8.155355] Hardware name: LS1028A RDB Board (DT)
> > [    8.160079] Workqueue: events_unbound deferred_probe_work_func
> > [    8.165945] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > [    8.172940] pc : vsc9959_wm_enc+0x2c/0x40
> > [    8.176967] lr : ocelot_setup_sharing_watermarks+0x94/0x1fc
> > [    8.182568] sp : ffff800008863810
> > [    8.185896] x29: ffff800008863810 x28: 0000000000000308 x27: 0000000000000001
> > [    8.193079] x26: 000000000300001a x25: 0000000000000008 x24: 0000000000000080
> > [    8.200261] x23: 0000000088888889 x22: 0000000000000000 x21: 0000000000000000
> > [    8.207441] x20: 00000000ffff2d17 x19: ffff2d17039d8010 x18: ffffd8e2afa23b28
> > [    8.214623] x17: 0000000000000015 x16: 0000000000000041 x15: 0000000000000000
> > [    8.221803] x14: ffffd8e2afa49228 x13: 0000000000025700 x12: 0000000000000009
> > [    8.228984] x11: ffff2d1703a96c18 x10: 0000000000000004 x9 : ffffd8e2ad6ce0f8
> > [    8.236165] x8 : ffff2d1700be4240 x7 : ffffd8e2af981000 x6 : 0000000000000001
> > [    8.243345] x5 : ffffd8e2ad1e1440 x4 : 0000000000000000 x3 : 0000000000000000
> > [    8.250525] x2 : 0000000000000000 x1 : ffffd8e2ad3d2810 x0 : 00000000000040c0
> > [    8.257706] Call trace:
> > [    8.260162]  vsc9959_wm_enc+0x2c/0x40
> > [    8.263841]  ocelot_devlink_sb_register+0x33c/0x380
> > [    8.268742]  felix_setup+0x438/0x750
> > [    8.272334]  dsa_register_switch+0x988/0x114c
> > [    8.276713]  felix_pci_probe+0x138/0x1fc
> > [    8.280654]  local_pci_probe+0x4c/0xc0
> > [    8.284423]  pci_device_probe+0x1b0/0x1f0
> > [    8.288451]  really_probe.part.0+0xa4/0x310
> > [    8.292654]  __driver_probe_device+0xa0/0x150
> > [    8.297030]  driver_probe_device+0xcc/0x164
> > [    8.301231]  __device_attach_driver+0xc4/0x130
> > [    8.305695]  bus_for_each_drv+0x84/0xe0
> > [    8.309547]  __device_attach+0xe4/0x190
> > [    8.313400]  device_initial_probe+0x20/0x30
> > [    8.317601]  bus_probe_device+0xac/0xb4
> > [    8.321454]  deferred_probe_work_func+0x98/0xd4
> > [    8.326004]  process_one_work+0x294/0x700
> > [    8.330037]  worker_thread+0x80/0x480
> > [    8.333717]  kthread+0x10c/0x120
> > [    8.336961]  ret_from_fork+0x10/0x20
> > [    8.340554] irq event stamp: 50432
> > [    8.343968] hardirqs last  enabled at (50431): [<ffffd8e2ade442b0>] _raw_spin_unlock_irqrestore+0x90/0xb0
> > [    8.353581] hardirqs last disabled at (50432): [<ffffd8e2ade36e44>] el1_dbg+0x24/0x90
> > [    8.361448] softirqs last  enabled at (50148): [<ffffd8e2ac6908f0>] __do_softirq+0x480/0x5f8
> > [    8.369923] softirqs last disabled at (50143): [<ffffd8e2ac728e3c>] __irq_exit_rcu+0x17c/0x1b0
> > [    8.378577] ---[ end trace 0000000000000000 ]---
> > [    8.383304] ------------[ cut here ]------------
> > [    8.387942] WARNING: CPU: 1 PID: 44 at drivers/net/dsa/ocelot/felix_vsc9959.c:1007 vsc9959_wm_enc+0x2c/0x40
> > [    8.397729] Modules linked in:
> > [    8.400800] CPU: 1 PID: 44 Comm: kworker/u4:2 Tainted: G        W         5.17.0-rc2-07010-ga9b9500ffaac-dirty #2104
> > [    8.411369] Hardware name: LS1028A RDB Board (DT)
> > [    8.416092] Workqueue: events_unbound deferred_probe_work_func
> > [    8.421955] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > [    8.428948] pc : vsc9959_wm_enc+0x2c/0x40
> > [    8.432975] lr : ocelot_setup_sharing_watermarks+0xc0/0x1fc
> > [    8.438573] sp : ffff800008863810
> > [    8.441900] x29: ffff800008863810 x28: 0000000000000308 x27: 0000000000000001
> > [    8.449081] x26: 000000000300001a x25: 0000000000000008 x24: 0000000000000080
> > [    8.456262] x23: 0000000088888889 x22: 0000000000000000 x21: 0000000000000000
> > [    8.463443] x20: 00000000ffff2d17 x19: ffff2d17039d8010 x18: ffffd8e2afa23b28
> > [    8.470623] x17: 0000000000000015 x16: 0000000000000041 x15: 0000000000000000
> > [    8.477804] x14: ffffd8e2afa49228 x13: 0000000000025700 x12: 0000000000000009
> > [    8.484984] x11: ffff2d1703a96c18 x10: 0000000000000004 x9 : ffffd8e2ad6ce124
> > [    8.492165] x8 : ffff2d1700be4240 x7 : ffffd8e2af981000 x6 : 0000000000000001
> > [    8.499345] x5 : ffffd8e2ad1e1440 x4 : 0000000000000000 x3 : 0000000000000001
> > [    8.506525] x2 : 0000000000000000 x1 : ffffd8e2ad3d2810 x0 : 00000000000040c0
> > [    8.513705] Call trace:
> > [    8.516161]  vsc9959_wm_enc+0x2c/0x40
> > [    8.519840]  ocelot_devlink_sb_register+0x33c/0x380
> > [    8.524740]  felix_setup+0x438/0x750
> > [    8.528331]  dsa_register_switch+0x988/0x114c
> > [    8.532708]  felix_pci_probe+0x138/0x1fc
> > [    8.536648]  local_pci_probe+0x4c/0xc0
> > [    8.540415]  pci_device_probe+0x1b0/0x1f0
> > [    8.544443]  really_probe.part.0+0xa4/0x310
> > [    8.548646]  __driver_probe_device+0xa0/0x150
> > [    8.553022]  driver_probe_device+0xcc/0x164
> > [    8.557225]  __device_attach_driver+0xc4/0x130
> > [    8.561688]  bus_for_each_drv+0x84/0xe0
> > [    8.565540]  __device_attach+0xe4/0x190
> > [    8.569393]  device_initial_probe+0x20/0x30
> > [    8.573594]  bus_probe_device+0xac/0xb4
> > [    8.577447]  deferred_probe_work_func+0x98/0xd4
> > [    8.581997]  process_one_work+0x294/0x700
> > [    8.586026]  worker_thread+0x80/0x480
> > [    8.589706]  kthread+0x10c/0x120
> > [    8.592949]  ret_from_fork+0x10/0x20
> > [    8.596541] irq event stamp: 50450
> > [    8.599955] hardirqs last  enabled at (50449): [<ffffd8e2ade442b0>] _raw_spin_unlock_irqrestore+0x90/0xb0                                                                                                
> > [    8.609566] hardirqs last disabled at (50450): [<ffffd8e2ade36e44>] el1_dbg+0x24/0x90
> > [    8.617431] softirqs last  enabled at (50446): [<ffffd8e2ac6908f0>] __do_softirq+0x480/0x5f8
> > [    8.625907] softirqs last disabled at (50435): [<ffffd8e2ac728e3c>] __irq_exit_rcu+0x17c/0x1b0
> > [    8.634559] ---[ end trace 0000000000000000 ]---
> > [    8.640586] device eth1 entered promiscuous mode
> > [    8.645499] Unable to handle kernel paging request at virtual address 00000010400020bc
> > [    8.653496] Mem abort info:
> > [    8.656340]   ESR = 0x96000044
> > [    8.659413]   EC = 0x25: DABT (current EL), IL = 32 bits
> > [    8.664784]   SET = 0, FnV = 0
> > [    8.667855]   EA = 0, S1PTW = 0
> > [    8.671044]   FSC = 0x04: level 0 translation fault
> > [    8.675979] Data abort info:
> > [    8.678875]   ISV = 0, ISS = 0x00000044
> > [    8.682762]   CM = 0, WnR = 1
> > [    8.685795] [00000010400020bc] user address but active_mm is swapper
> > [    8.692272] Internal error: Oops: 96000044 [#1] PREEMPT SMP
> > [    8.697865] Modules linked in:
> > [    8.700928] CPU: 1 PID: 44 Comm: kworker/u4:2 Tainted: G        W         5.17.0-rc2-07010-ga9b9500ffaac-dirty #2104
> > [    8.711490] Hardware name: LS1028A RDB Board (DT)
> > [    8.716206] Workqueue: events_unbound deferred_probe_work_func
> > [    8.722065] pstate: 00000005 (nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > [    8.729051] pc : ocelot_phylink_mac_link_down+0x70/0x314
> > [    8.734381] lr : felix_phylink_mac_link_down+0x24/0x30
> > [    8.739536] sp : ffff800008863840
> > [    8.742856] x29: ffff800008863840 x28: 0000000000000000 x27: ffffd8e2af8ef180
> > [    8.750022] x26: 0000000000000001 x25: 0000001040002000 x24: 0000000000000000
> > [    8.757187] x23: 0000000000000180 x22: ffff2d1703a92000 x21: 0000000000000004
> > [    8.764352] x20: 0000000000000004 x19: ffff2d17039d8010 x18: ffffd8e2afa23b28
> > [    8.771516] x17: 0000000000040006 x16: 0000000500030008 x15: ffffffffffffffff
> > [    8.778680] x14: ffffffffffff0000 x13: ffffffffffffffff x12: 0000000000000010
> > [    8.785845] x11: 0101010101010101 x10: 0000000000000004 x9 : ffffd8e2ad3d0814
> > [    8.793009] x8 : fefefefefeff6a6d x7 : 0000ffffffffffff x6 : 0000000000000000
> > [    8.800173] x5 : 00000000ffffffff x4 : 0000000000000001 x3 : 000000000d000007
> > [    8.807338] x2 : 0000000000000010 x1 : 0000000000000000 x0 : 0000001040002000
> > [    8.814502] Call trace:
> > [    8.816949]  ocelot_phylink_mac_link_down+0x70/0x314
> > [    8.821929]  felix_phylink_mac_link_down+0x24/0x30
> > [    8.826734]  dsa_port_link_register_of+0xa8/0x240
> > [    8.831454]  dsa_port_setup+0xb4/0x180
> > [    8.835212]  dsa_register_switch+0xdb4/0x114c
> > [    8.839581]  felix_pci_probe+0x138/0x1fc
> > [    8.843515]  local_pci_probe+0x4c/0xc0
> > [    8.847275]  pci_device_probe+0x1b0/0x1f0
> > [    8.851296]  really_probe.part.0+0xa4/0x310
> > [    8.855490]  __driver_probe_device+0xa0/0x150
> > [    8.859860]  driver_probe_device+0xcc/0x164
> > [    8.864054]  __device_attach_driver+0xc4/0x130
> > [    8.868510]  bus_for_each_drv+0x84/0xe0
> > [    8.872355]  __device_attach+0xe4/0x190
> > [    8.876200]  device_initial_probe+0x20/0x30
> > [    8.880395]  bus_probe_device+0xac/0xb4
> > [    8.884240]  deferred_probe_work_func+0x98/0xd4
> > [    8.888783]  process_one_work+0x294/0x700
> > [    8.892808]  worker_thread+0x80/0x480
> > [    8.896480]  kthread+0x10c/0x120
> > [    8.899715]  ret_from_fork+0x10/0x20
> > [    8.903303] Code: 52800202 f874d839 52800001 aa1903e0 (b900bf25)
> > [    8.909417] ---[ end trace 0000000000000000 ]---
> > 
> > Investigating...
> 
> I just did a sanity check on my latest tree and it doesn't crash. I'll
> keep an eye out here as well.
> 
> Gives me an opportunity to fix up your other suggestion.
> 
> Thanks for reviewing and testing Vladimir!

I don't know why this glitch happened. I couldn't reproduce this a
second time. Booting patch by patch with KASAN enabled also showed
nothing. And the stack traces don't make much sense either. I'll just
ignore it for now.

But anyway, I found some actual bugs in the code while testing, I'll
reply on the individual patch.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ