lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 29 Nov 2017 14:27:05 +0100
From:   Maarten Lankhorst <dev@...ankhorst.nl>
To:     Thomas Gleixner <tglx@...utronix.de>,
        Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Ingo Molnar <mingo@...nel.org>,
        "H. Peter Anvin" <hpa@...or.com>, Borislav Petkov <bp@...en8.de>,
        Peter Zijlstra <peterz@...radead.org>,
        Jani Nikula <jani.nikula@...ux.intel.com>,
        Intel Graphics Development <intel-gfx@...ts.freedesktop.org>
Subject: Re: [GIT pull] x86 APIC updates for 4.15

Hey,

Op 13-11-17 om 13:05 schreef Thomas Gleixner:
> Linus,
>
> please pull the latest x86-apic-for-linus git tree from:
>
>    git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86-apic-for-linus
>
> This update provides a major overhaul of the APIC initialization and vector
> allocation code:
>
>   - Unification of the APIC and interrupt mode setup which was scattered
>     all over the place and was hard to follow. This also distangles the
>     timer setup from the APIC initialization which brings a clear
>     separation of functionality.
>
>     Great detective work from Dou Lyiang!
>
>   - Refactoring of the x86 vector allocation mechanism. The existing code
>     was based on nested loops and rather convoluted APIC callbacks which
>     had a horrible worst case behaviour and tried to serve all different
>     use cases in one go. This led to quite odd hacks when supporting the
>     new managed interupt facility for multiqueue devices and made it more
>     or less impossible to deal with the vector space exhaustion which was a
>     major roadblock for server hibernation.
>
>     Aside of that the code dealing with cpu hotplug and the system vectors
>     was disconnected from the actual vector management and allocation code,
>     which made it hard to follow and maintain.
>
>     Utilizing the new bitmap matrix allocator core mechanism, the new
>     allocator and management code consolidates the handling of system
>     vectors, legacy vectors, cpu hotplug mechanisms and the actual
>     allocation which needs to be aware of system and legacy vectors and
>     hotplug constraints into a single consistent entity.
>
>     This has one visible change: The support for multi CPU targets of
>     interrupts, which is only available on a certain subset of CPUs/APIC
>     variants has been removed in favour of single interrupt targets. A
>     proper analysis of the multi CPU target feature revealed that there is
>     no real advantage as the vast majority of interrupts end up on the CPU
>     with the lowest APIC id in the set of target CPUs anyway. That change
>     was agreed on by the relevant folks and allowed to simplify the
>     implementation significantly and to replace rather fragile constructs
>     like the vector cleanup IPI with straight forward and solid code.
>     
>     Furthermore this allowed to cleanly separate the allocation details for
>     legacy, normal and managed interrupts.
>
>      - Legacy interrupts are not longer wasting 16 vectors unconditionally
>
>      - Managed interrupts have now a guaranteed vector reservation, but the
>        actual vector assignment happens when the interrupt is
>        requested. It's guaranteed not to fail.
>
>      - Normal interrupts no longer allocate vectors unconditionally when
>        the interrupt is set up (IO/APIC init or MSI(X) enable). The
>        mechanism has been switched to a best effort reservation mode. The
>        actual allocation happens when the interrupt is requested. Contrary
>        to managed interrupts the request can fail due to vector space
>        exhaustion, but drivers must handle a fail of request_irq()
>        anyway. When the interrupt is freed, the vector is handed back as
>        well.
>
>        This solves a long standing problem with large unconditional
>        vector allocations for a certain class of enterprise devices which
>        prevented server hibernation due to vector space exhaustion when the
>        unused allocated vectors had to be migrated to CPU0 while unplugging
>        all non boot CPUs.
>
>     The code has been equipped with trace points and detailed debugfs
>     information to aid analysis of the vector space.
>
The changes to interrupts bring down our CI during hibernate, see:

https://bugs.freedesktop.org/show_bug.cgi?id=103712

I created a bug report at https://bugzilla.kernel.org/show_bug.cgi?id=198033

Short reproducer:

Create a swapfile on a snb 2600, attempt to hibernate to it with echo disk > /sys/power/state, this will fail in the end, but will go through most of the steps.

After the almost complete hibernate, i915 will not receive irqs any more, which kills our entire integration testing.

Kernel config is available at https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3402/kernel.config.bz2
Results with pull request reverted at https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_7333/shards.html

First bad commit:

commit fdba46ffb4c203b6e6794163493fd310f98bb4be (HEAD, refs/bisect/bad)
Author: Thomas Gleixner <tglx@...utronix.de>
Date:   Wed Sep 13 23:29:27 2017 +0200

    x86/apic: Get rid of multi CPU affinity

dmesg:
[   25.419245] PM: hibernation entry
[   25.420957] PM: Syncing filesystems ... 
[   25.464097] PM: done.
[   25.464150] Freezing user space processes ... (elapsed 0.006 seconds) done.
[   25.470453] OOM killer disabled.
[   25.470822] PM: Marking nosave pages: [mem 0x00000000-0x00000fff]
[   25.470844] PM: Marking nosave pages: [mem 0x0009d000-0x000fffff]
[   25.470856] PM: Marking nosave pages: [mem 0x20000000-0x201fffff]
[   25.470878] PM: Marking nosave pages: [mem 0x40000000-0x401fffff]
[   25.470899] PM: Marking nosave pages: [mem 0xc6d51000-0xc6dc9fff]
[   25.470911] PM: Marking nosave pages: [mem 0xc6dcb000-0xc6e7ffff]
[   25.470925] PM: Marking nosave pages: [mem 0xc7000000-0xffffffff]
[   25.471415] PM: Basic memory bitmaps created
[   25.471671] PM: Preallocating image memory... done (allocated 356937 pages)
[   25.885560] PM: Allocated 1427748 kbytes in 0.41 seconds (3482.31 MB/s)
[   25.885565] Freezing remaining freezable tasks ... (elapsed 0.002 seconds) done.
[   25.889092] Suspending console(s) (use no_console_suspend to debug)
[   25.974490] ACPI: Preparing to enter system sleep state S4
[   25.988975] PM: Saving platform NVS memory
[   25.991668] Disabling non-boot CPUs ...
[   26.010179] smpboot: CPU 1 is now offline
[   26.020702] smpboot: CPU 2 is now offline
[   26.032425] smpboot: CPU 3 is now offline
[   26.043173] IRQ fixup: irq 28 move in progress, old vector 34
[   26.043179] IRQ 28: no longer affine to CPU4
[   26.044792] smpboot: CPU 4 is now offline
[   26.057162] smpboot: CPU 5 is now offline
[   26.070037] IRQ 26: no longer affine to CPU6
[   26.071085] smpboot: CPU 6 is now offline
[   26.077927] IRQ 23: no longer affine to CPU7
[   26.077932] IRQ 29: no longer affine to CPU7
[   26.078992] smpboot: CPU 7 is now offline
[   26.079442] PM: Creating hibernation image:
[   26.158636] PM: Need to copy 354687 pages
[   26.158640] PM: Normal pages needed: 354687 + 1024, available pages: 654991
[   26.573976] PM: Hibernation image created (354687 pages copied)
[   26.081028] PM: Restoring platform NVS memory
[   26.081517] Enabling non-boot CPUs ...
[   26.081607] x86: Booting SMP configuration:
[   26.081609] smpboot: Booting Node 0 Processor 1 APIC 0x2
[   26.085779]  cache: parent cpu1 should not be sleeping
[   26.086976] CPU1 is up
[   26.087058] smpboot: Booting Node 0 Processor 2 APIC 0x4
[   26.090461]  cache: parent cpu2 should not be sleeping
[   26.091067] CPU2 is up
[   26.091126] smpboot: Booting Node 0 Processor 3 APIC 0x6
[   26.094160]  cache: parent cpu3 should not be sleeping
[   26.094827] CPU3 is up
[   26.094867] smpboot: Booting Node 0 Processor 4 APIC 0x1
[   26.098101]  cache: parent cpu4 should not be sleeping
[   26.098816] CPU4 is up
[   26.098858] smpboot: Booting Node 0 Processor 5 APIC 0x3
[   26.101943]  cache: parent cpu5 should not be sleeping
[   26.102768] CPU5 is up
[   26.102814] smpboot: Booting Node 0 Processor 6 APIC 0x5
[   26.105932]  cache: parent cpu6 should not be sleeping
[   26.106821] CPU6 is up
[   26.106863] smpboot: Booting Node 0 Processor 7 APIC 0x7
[   26.109978]  cache: parent cpu7 should not be sleeping
[   26.111012] CPU7 is up
[   26.117799] ACPI: Waking up from system sleep state S4
[   26.137795] tg3 0000:02:00.0 enp2s0: Link is down
[   26.149149] tg3 0000:02:00.0 enp2s0: Link is down
[   26.165035] do_IRQ: 4.162 No irq handler for vector
[   26.250080] vblank wait timed out on crtc 0
[   26.250091] ------------[ cut here ]------------
[   26.250096] WARNING: CPU: 5 PID: 150 at drivers/gpu/drm/drm_vblank.c:1066 drm_wait_one_vblank+0x17e/0x190
[   26.250096] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal i915 intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul broadcom bcm_phy_lib ghash_clmulni_intel tg3 snd_hda_intel ptp pps_core snd_hda_codec snd_hwdep snd_hda_core snd_pcm mei_me lpc_ich mei prime_numbers
[   26.250112] CPU: 5 PID: 150 Comm: kworker/u16:4 Not tainted 4.14.0-rc2-valkyria+ #17
[   26.250113] Hardware name: Dell Inc. XPS 8300  /0Y2MRG, BIOS A06 10/17/2011
[   26.250116] Workqueue: events_unbound async_run_entry_fn
[   26.250118] task: ffff8801285b29c0 task.stack: ffffc900002f0000
[   26.250119] RIP: 0010:drm_wait_one_vblank+0x17e/0x190
[   26.250120] RSP: 0000:ffffc900002f3a28 EFLAGS: 00010292
[   26.250122] RAX: 000000000000001f RBX: ffff88011dee0000 RCX: 0000000000000000
[   26.250122] RDX: 0000000080000001 RSI: 0000000000000001 RDI: ffffffff810e81d2
[   26.250123] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
[   26.250124] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[   26.250125] R13: 0000000000000000 R14: 0000000000000519 R15: ffff880113261c00
[   26.250126] FS:  0000000000000000(0000) GS:ffff88012fb40000(0000) knlGS:0000000000000000
[   26.250127] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   26.250128] CR2: 0000000000000000 CR3: 000000011f6be002 CR4: 00000000000606e0
[   26.250128] Call Trace:
[   26.250132]  ? wait_woken+0x90/0x90
[   26.250167]  ironlake_crtc_enable+0x764/0xbd0 [i915]
[   26.250191]  intel_update_crtc+0x46/0xd0 [i915]
[   26.250213]  intel_update_crtcs+0x67/0x80 [i915]
[   26.250233]  intel_atomic_commit_tail+0x2df/0xea0 [i915]
[   26.250251]  ? __i915_sw_fence_complete+0x25/0x230 [i915]
[   26.250271]  intel_atomic_commit+0x1b7/0x270 [i915]
[   26.250274]  ? pci_pm_poweroff+0xf0/0xf0
[   26.250276]  drm_atomic_helper_commit_duplicated_state+0xe0/0x100
[   26.250297]  __intel_display_resume+0x7e/0xc0 [i915]
[   26.250317]  intel_display_resume+0xbc/0xe0 [i915]
[   26.250334]  i915_pm_restore+0xef/0x190 [i915]
[   26.250337]  pci_pm_thaw+0x59/0x90
[   26.250339]  dpm_run_callback+0x69/0x320
[   26.250341]  device_resume+0xa8/0x1d0
[   26.250343]  ? dpm_watchdog_set+0x60/0x60
[   26.250345]  async_resume+0x19/0x40
[   26.250347]  async_run_entry_fn+0x33/0x160
[   26.250349]  process_one_work+0x213/0x6b0
[   26.250352]  worker_thread+0x4d/0x3a0
[   26.250355]  kthread+0x145/0x180
[   26.250356]  ? process_one_work+0x6b0/0x6b0
[   26.250358]  ? kthread_create_on_node+0x40/0x40
[   26.250360]  ? kthread_create_on_node+0x40/0x40
[   26.250362]  ret_from_fork+0x27/0x40
[   26.250365] Code: 0f ff 31 c0 e9 32 ff ff ff 48 89 e6 4c 89 ff e8 79 0e ae ff 45 85 ed 0f 85 23 ff ff ff 89 ee 48 c7 c7 c8 e8 ce 81 e8 fd 4f b0 ff <0f> ff e9 0e ff ff ff 90 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 
[   26.250396] ---[ end trace 75a55742bff18677 ]---
[   26.301078] pipe A vblank wait timed out
[   26.301089] ------------[ cut here ]------------
[   26.301110] WARNING: CPU: 5 PID: 150 at drivers/gpu/drm/i915/intel_display.c:12172 intel_atomic_commit_tail+0xe77/0xea0 [i915]
[   26.301110] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal i915 intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul broadcom bcm_phy_lib ghash_clmulni_intel tg3 snd_hda_intel ptp pps_core snd_hda_codec snd_hwdep snd_hda_core snd_pcm mei_me lpc_ich mei prime_numbers
[   26.301124] CPU: 5 PID: 150 Comm: kworker/u16:4 Tainted: G        W       4.14.0-rc2-valkyria+ #17
[   26.301124] Hardware name: Dell Inc. XPS 8300  /0Y2MRG, BIOS A06 10/17/2011
[   26.301126] Workqueue: events_unbound async_run_entry_fn
[   26.301127] task: ffff8801285b29c0 task.stack: ffffc900002f0000
[   26.301146] RIP: 0010:intel_atomic_commit_tail+0xe77/0xea0 [i915]
[   26.301147] RSP: 0000:ffffc900002f3b58 EFLAGS: 00010292
[   26.301149] RAX: 000000000000001c RBX: ffff88011dee0000 RCX: 0000000000000000
[   26.301149] RDX: 0000000080000001 RSI: 0000000000000001 RDI: ffffffff810e81d2
[   26.301150] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
[   26.301151] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[   26.301152] R13: 0000000000000000 R14: ffff8801132612a8 R15: 0000000000000001
[   26.301153] FS:  0000000000000000(0000) GS:ffff88012fb40000(0000) knlGS:0000000000000000
[   26.301154] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   26.301155] CR2: 0000000000000000 CR3: 000000011f6be002 CR4: 00000000000606e0
[   26.301155] Call Trace:
[   26.301159]  ? wait_woken+0x90/0x90
[   26.301178]  intel_atomic_commit+0x1b7/0x270 [i915]
[   26.301180]  ? pci_pm_poweroff+0xf0/0xf0
[   26.301182]  drm_atomic_helper_commit_duplicated_state+0xe0/0x100
[   26.301202]  __intel_display_resume+0x7e/0xc0 [i915]
[   26.301221]  intel_display_resume+0xbc/0xe0 [i915]
[   26.301238]  i915_pm_restore+0xef/0x190 [i915]
[   26.301241]  pci_pm_thaw+0x59/0x90
[   26.301242]  dpm_run_callback+0x69/0x320
[   26.301244]  device_resume+0xa8/0x1d0
[   26.301246]  ? dpm_watchdog_set+0x60/0x60
[   26.301248]  async_resume+0x19/0x40
[   26.301250]  async_run_entry_fn+0x33/0x160
[   26.301252]  process_one_work+0x213/0x6b0
[   26.301254]  worker_thread+0x4d/0x3a0
[   26.301257]  kthread+0x145/0x180
[   26.301258]  ? process_one_work+0x6b0/0x6b0
[   26.301260]  ? kthread_create_on_node+0x40/0x40
[   26.301262]  ? kthread_create_on_node+0x40/0x40
[   26.301263]  ret_from_fork+0x27/0x40
[   26.301265] Code: 24 50 4c 89 04 24 48 83 c7 08 e8 25 4f e9 e0 4c 8b 04 24 4d 85 c0 0f 85 08 fe ff ff 8d 75 41 48 c7 c7 38 fb 2d a0 e8 a4 90 eb e0 <0f> ff e9 f2 fd ff ff 8d 70 41 48 c7 c7 08 fb 2d a0 e8 8e 90 eb 
[   26.301296] ---[ end trace 75a55742bff18678 ]---
[   26.463222] ata2: SATA link down (SStatus 0 SControl 300)
[   26.463520] ata5: SATA link down (SStatus 0 SControl 300)
[   26.463538] ata4: SATA link down (SStatus 0 SControl 300)
[   26.463556] ata3: SATA link down (SStatus 0 SControl 300)
[   26.463575] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[   26.479505] ata1.00: configured for UDMA/133
[   29.219139] tg3 0000:02:00.0 enp2s0: Link is up at 1000 Mbps, full duplex
[   29.219141] tg3 0000:02:00.0 enp2s0: Flow control is off for TX and off for RX
[   36.322089] [drm:drm_atomic_helper_commit_cleanup_done] *ERROR* [CRTC:36:pipe A] flip_done timed out
[   36.505505] PM: Using 3 thread(s) for compression.
               PM: Compressing and saving image data (355380 pages)...
[   36.505516] PM: Image saving progress:   0%
[   36.698113] PM: Image saving progress:  10%
[   37.043523] PM: Image saving progress:  20%
[   37.358873] PM: Image saving progress:  30%
[   37.679591] PM: Image saving progress:  40%
[   37.907804] PM: Image saving progress:  50%
[   38.320557] PM: Image saving progress:  60%
[   38.713196] PM: Image saving progress:  70%
[   39.051899] PM: Image saving progress:  80%
[   39.361094] PM: Image saving progress:  90%
[   39.551672] PM: Image saving done.
[   39.551679] PM: Wrote 1421520 kbytes in 3.04 seconds (467.60 MB/s)
[   39.551965] PM: S
[   39.552065] PM: Swap header not found!
[   39.552077] |
[   39.657673] PM: Basic memory bitmaps freed
[   39.657678] OOM killer enabled.
[   39.657679] Restarting tasks ... done.
[   39.659749] PM: hibernation exit

Could you have a look at it please?

Cheers,
~Maarten

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ