lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZFdbtipfTsIF0u6z@yujie-X299>
Date:   Sun, 7 May 2023 16:05:10 +0800
From:   Yujie Liu <yujie.liu@...el.com>
To:     Thomas Gleixner <tglx@...utronix.de>
CC:     Shanker Donthineni <sdonthineni@...dia.com>,
        <oe-lkp@...ts.linux.dev>, <lkp@...el.com>,
        <linux-kernel@...r.kernel.org>, Marc Zyngier <maz@...nel.org>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        "Michael Walle" <michael@...le.cc>,
        Vikram Sethi <vsethi@...dia.com>
Subject: Re: [PATCH v3 3/3] genirq: Use the maple tree for IRQ descriptors
 management

Hi Thomas,

Sorry for late reply as we were on public holiday earlier this week.

On Fri, Apr 28, 2023 at 12:31:14PM +0200, Thomas Gleixner wrote:
> On Fri, Apr 28 2023 at 09:33, Yujie Liu wrote:
> > On Wed, Apr 26, 2023 at 02:08:54PM +0200, Thomas Gleixner wrote:
> >> Can you please retest this on v6.3 and report back when the problem
> >> persists?
> >
> > Thanks for your help looking into this problem.
> >
> > The problem persists when tested on v6.3, but not 100% reproducible.
> > We ran the test on v6.3 and v6.3+patch each for 20 runs. There are 9
> > failed runs on v6.3+patch, while v6.3 is all clean. Full dmesg is
> > attached.
> 
> Under the assumption that the code is correct, then the effect of this
> patch is that it changes the timing. Sigh.
> 
>   1) Does this happen with a 64-bit kernel too?

It doesn't happen on a 64-bit kernel:

=========================================================================================
commit/compiler/disk/kconfig/rootfs/tbox_group/test/testcase:
  32c58fc685e5cd6b5947a5f8e9a3c318a63fcf0a/gcc-11/1SSD/i386-debian-10.3-func/debian-11.1-i386-20220923.cgz/lkp-skl-d06/block-group-00/blktests

commit/compiler/disk/kconfig/rootfs/tbox_group/test/testcase:
  32c58fc685e5cd6b5947a5f8e9a3c318a63fcf0a/gcc-11/1SSD/x86_64-rhel-8.3-func/debian-11.1-x86_64-20220510.cgz/lkp-skl-d06/block-group-00/blktests

kconfig:
  i386-debian-10.3-func
  x86_64-rhel-8.3-func

i386-debian-10.3        x86_64-rhel-8.3-func
---------------- ---------------------------
       fail:runs  %reproduction    fail:runs
           |             |             |
         16:48         -33%            :7     dmesg.EIP:default_send_IPI_mask_logical
         16:48         -33%            :7     dmesg.WARNING:at_arch/x86/kernel/apic/ipi.c:#default_send_IPI_mask_logical

>   2) Can you enable the irq_vector:vector_*.* tracepoints and provide
>      the trace?

I'm a beginner of kernel and not sure if I'm doing this correctly. Here
are my test steps:

# enable irq_vector events
# cat /sys/kernel/debug/tracing/set_event
irq_vectors:vector_free_moved
irq_vectors:vector_setup
irq_vectors:vector_teardown
irq_vectors:vector_deactivate
irq_vectors:vector_activate
irq_vectors:vector_alloc_managed
irq_vectors:vector_alloc
irq_vectors:vector_reserve
irq_vectors:vector_reserve_managed
irq_vectors:vector_clear
irq_vectors:vector_update
irq_vectors:vector_config

# turn on the tracer
# cat /sys/kernel/debug/tracing/tracing_on
1

# run the testcase
# ./check block/008
block/008 => sdb2 (do IO while hotplugging CPUs)
    read iops  0       ...
    runtime    0.576s  ...

block/008 => sdb2 (do IO while hotplugging CPUs)             [failed]
    read iops  0       ...  0d06 at May  7 07:33:46 ...
    runtime    0.576s  ...  0.580sommon_interrupt: 2.34 No irq handler for vector
    --- tests/block/008.out     2023-04-18 17:36:53.000000000 +0000
    +++ /lkp/benchmarks/blktests/results/sdb2/block/008.out.bad 2023-05-07 07:33:47.336596084 +0000
    @@ -1,2 +1,8 @@
     Running block/008
    +fio: io_u error on file /dev/sdb2: Input/output error: read offset=273571840, buflen=4096
    +fio: io_u error on file /dev/sdb2: Input/output error: read offset=64757760, buflen=4096
    +fio: io_u error on file /dev/sdb2: Input/output error: read offset=700571648, buflen=4096
    +fio: io_u error on file /dev/sdb2: Input/output error: read offset=105357312, buflen=4096
    +fio exited with status 0
    +4;fio-3.25;reads;0;5;0;0;0;0;0;0;0.000000;0.000000;0;0;0.000000;0.000000;1.000000%=0;5.000000%=0;10.000000%=0;20.000000%=0;30.000000%=0;40.000000%=0;50.000000%=0;60.000000%=0;70.000000%=0;80.000000%=0;90.000000%=0;95.000000%=0;99.000000%=0;99.500000%=0;99.900000%=0;99.950000%=0;99.990000%=0;0%=0;0%=0;0%=0;0;0;0.000000;0.000000;0;0;0.000000%;0.000000;0.000000;0;0;0;0;0;0;0.000000;0.000000;0;0;0.000000;0.000000;1.000000%=0;5.000000%=0;10.000000%=0;20.000000%=0;30.000000%=0;40.000000%=0;50.000000%=0;60.000000%=0;70.000000%=0;80.000000%=0;90.000000%=0;95.000000%=0;99.000000%=0;99.500000%=0;99.900000%=0;99.950000%=0;99.990000%=0;0%=0;0%=0;0%=0;0;0;0.000000;0.000000;0;0;0.000000%;0.000000;0.000000;0;0;0;0;0;0;0.000000;0.000000;0;0;0.000000;0.000000;1.000000%=0;5.000000%=0;10.000000%=0;20.000000%=0;30.000000%=0;40.000000%=0;50.000000%=0;60.000000%=0;70.000000%=0;80.000000%=0;90.000000%=0;95.000000%=0;99.000000%=0;99.500000%=0;99.900000%=0;99.950000%=0;99.990000%=0;0%=0;0%=0;0%=0;0;0;0.000000;0.000000;0;0;0.000000%;0.000000;0.000000;0.000000%;0.000000%;12;0;163;100.0%;0.0%;0.0%;0.0%;0.0%;0.0%;0.0%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;sdb;0;0;0;0;0;0;0;0.00%
    ...
    (Run 'diff -u tests/block/008.out /lkp/benchmarks/blktests/results/sdb2/block/008.out.bad' to see the entire diff)

# check the trace
# cat /sys/kernel/debug/tracing/trace
# tracer: nop
#
# entries-in-buffer/entries-written: 0/0   #P:4
#
#                                _-----=> irqs-off/BH-disabled
#                               / _----=> need-resched
#                              | / _---=> hardirq/softirq
#                              || / _--=> preempt-depth
#                              ||| / _-=> migrate-disable
#                              |||| /     delay
#           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
#              | |         |   |||||     |         |

Nothing was written to trace buffer, seems like no irq_vector events
were captured during this test.

> And please provide /proc/interrupts from that machine.

Before running the test:

# cat /proc/interrupts
            CPU0       CPU1       CPU2       CPU3
   0:         34          0          0          0   IO-APIC    2-edge      timer
   1:          4          0          0          0   IO-APIC    1-edge      i8042
   4:          0          0        635          0   IO-APIC    4-edge      ttyS0
   8:          0          0          0          0   IO-APIC    8-edge      rtc0
   9:          0          0          0          0   IO-APIC    9-fasteoi   acpi
  12:          0          0          0          6   IO-APIC   12-edge      i8042
  16:          0          0          0          4   IO-APIC   16-fasteoi   i801_smbus
  19:         12          0          0          0   IO-APIC   19-fasteoi
 120:       3608          0        617          0  PCI-MSI-0000:00:1f.6    0-edge      eth0
 121:         50          0          0          0  PCI-MSI-0000:00:16.0    0-edge      mei_me
 122:          0          0          0          0  PCI-MSI-0000:00:14.0    0-edge      xhci_hcd
 123:          0          0    2109981          0  PCI-MSI-0000:00:17.0    0-edge      ahci[0000:00:17.0]
 124:          0          0          0        206  PCI-MSI-0000:00:02.0    0-edge      i915
 NMI:          0          0          0          0   Non-maskable interrupts
 LOC:     221252     228764     471985     229286   Local timer interrupts
 SPU:          0          0          0          0   Spurious interrupts
 PMI:          0          0          0          0   Performance monitoring interrupts
 IWI:         16          0          0          8   IRQ work interrupts
 RTR:        108         25         15         11   APIC ICR read retries
 RES:       5016       5486        697      10440   Rescheduling interrupts
 CAL:       4618       4578       2775       5754   Function call interrupts
 TLB:         12         15         14         18   TLB shootdowns
 TRM:          0          0          0          0   Thermal event interrupts
 THR:          0          0          0          0   Threshold APIC interrupts
 MCE:          0          0          0          0   Machine check exceptions
 MCP:          4        108        109        107   Machine check polls
 ERR:          0
 MIS:          0
 PIN:          0          0          0          0   Posted-interrupt notification event
 NPI:          0          0          0          0   Nested posted-interrupt event
 PIW:          0          0          0          0   Posted-interrupt wakeup event

After running the test:

# cat /proc/interrupts
            CPU0       CPU1       CPU2       CPU3
   0:         34          0          0          0   IO-APIC    2-edge      timer
   1:          4          0          0          0   IO-APIC    1-edge      i8042
   4:          0          0        635          0   IO-APIC    4-edge      ttyS0
   8:          0          0          0          0   IO-APIC    8-edge      rtc0
   9:          0          0          0          0   IO-APIC    9-fasteoi   acpi
  12:          0          0          0          6   IO-APIC   12-edge      i8042
  16:          0          0          0          4   IO-APIC   16-fasteoi   i801_smbus
  19:         12          0          0          0   IO-APIC   19-fasteoi
 120:       3807          0        617          0  PCI-MSI-0000:00:1f.6    0-edge      eth0
 121:         50          0          0          0  PCI-MSI-0000:00:16.0    0-edge      mei_me
 122:          0          0          0          0  PCI-MSI-0000:00:14.0    0-edge      xhci_hcd
 123:          0          0    2109981          0  PCI-MSI-0000:00:17.0    0-edge      ahci[0000:00:17.0]
 124:          0          0          0        206  PCI-MSI-0000:00:02.0    0-edge      i915
 NMI:          0          0          0          0   Non-maskable interrupts
 LOC:     221568     228973     472239     229475   Local timer interrupts
 SPU:          0          0          0          0   Spurious interrupts
 PMI:          0          0          0          0   Performance monitoring interrupts
 IWI:         16          0          0          8   IRQ work interrupts
 RTR:        108         25         16         11   APIC ICR read retries
 RES:       5017       5488        698      10440   Rescheduling interrupts
 CAL:       4622       4584       2783       5757   Function call interrupts
 TLB:         12         15         15         19   TLB shootdowns
 TRM:          0          0          0          0   Thermal event interrupts
 THR:          0          0          0          0   Threshold APIC interrupts
 MCE:          0          0          0          0   Machine check exceptions
 MCP:          4        108        109        109   Machine check polls
 ERR:          0
 MIS:          0
 PIN:          0          0          0          0   Posted-interrupt notification event
 NPI:          0          0          0          0   Nested posted-interrupt event
 PIW:          0          0          0          0   Posted-interrupt wakeup event


Thanks again for taking the time to look into this.

--
Best Regards,
Yujie

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ