[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <7dbbf0ad-b433-da5b-2fb7-7fae531becc2@leemhuis.info>
Date: Sun, 1 Oct 2017 15:17:09 +0200
From: Thorsten Leemhuis <regressions@...mhuis.info>
To: Yanko Kaneti <yaneti@...lera.com>
Cc: LKML <linux-kernel@...r.kernel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Chuck Ebbert <cebbert.lkml@...il.com>,
Marc Zyngier <marc.zyngier@....com>
Subject: Re: [regression 4.14rc] 74def747bcd0 (genirq: Restrict effective
affinity to interrupts actually using it)
On 01.10.2017 15:06, Yanko Kaneti wrote:
> On Sun, 2017-10-01 at 14:46 +0200, Thorsten Leemhuis wrote:
>> Hi, the regression tracker here. What's the status of this issue? Was
>> the problem fixed? It seems nothing happened for more than 10 days -- or
>> did the discussion move somewhere else? Ciao, Thorsten
> The commit was reverted last week before rc2
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0551968add53777fddd18f4ffb4e3bbc1f646d79
I could have sworn I checked that :-/ Thx for the hint and sorry for the
noise! Ciao, Thorsten
>> On 20.09.2017 02:30, Chuck Ebbert wrote:
>>> On Tue, 19 Sep 2017 16:51:06 +0100
>>> Marc Zyngier <marc.zyngier@....com> wrote:
>>>
>>>> On 19/09/17 16:40, Yanko Kaneti wrote:
>>>>> On Tue, 2017-09-19 at 16:33 +0100, Marc Zyngier wrote:
>>>>>> On 19/09/17 16:12, Yanko Kaneti wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>> Fedora rawhide config here.
>>>>>>> AMD FX-8370E
>>>>>>>
>>>>>>> Bisected a problem to:
>>>>>>> 74def747bcd0 (genirq: Restrict effective affinity to interrupts
>>>>>>> actually using it)
>>>>>>>
>>>>>>> It seems to be causing stalls, short lived or long lived lockups
>>>>>>> very shortly after boot. Everything becomes jerky.
>>>>>>>
>>>>>>> The only visible in the log indication is something like :
>>>>>>> ....
>>>>>>> [ 59.802129] clocksource: timekeeping watchdog on CPU3: Marking
>>>>>>> clocksource 'tsc' as unstable because the skew is too large:
>>>>>>> [ 59.802134] clocksource: 'hpet' wd_now:
>>>>>>> 3326e7aa wd_last: 329956f8 mask: ffffffff [ 59.802137]
>>>>>>> clocksource: 'tsc' cs_now: 423662bc6f
>>>>>>> cs_last: 41dfc91650 mask: ffffffffffffffff [ 59.802140] tsc:
>>>>>>> Marking TSC unstable due to clocksource watchdog [ 59.802158]
>>>>>>> TSC found unstable after boot, most likely due to broken BIOS.
>>>>>>> Use 'tsc=unstable'. [ 59.802161] sched_clock: Marking unstable
>>>>>>> (59802142067, 15510)<-(59920871789, -118714277) [ 60.015604]
>>>>>>> clocksource: Switched to clocksource hpet [ 89.015994] INFO:
>>>>>>> NMI handler (perf_event_nmi_handler) took too long to run:
>>>>>>> 209.660 msecs [ 89.016003] perf: interrupt took too long
>>>>>>> (1638003 > 2500), lowering kernel.perf_event_max_sample_rate to
>>>>>>> 1000 ....
>>>>>>>
>>>>>>> Just reverting that commit on top of linus mainline cures all the
>>>>>>> symptoms
>>>>>>
>>>>>> Interesting. Do you still get HPET interrupts?
>>>>>
>>>>> Sorry, I might need some basic help here (i.e where do I count
>>>>> them...)
>>>>
>>>> /proc/interrupts should display them.
>>>>
>>>>> After the watchdog switches the clocksource to hpet the system is
>>>>> still somewhat alive, so I'll guess some clock is still
>>>>> ticking....
>>>>
>>>> Probably, but I suspect they're not hitting the right CPU, hence the
>>>> lockups.
>>>>
>>>> Unfortunately, my x86-foo is pretty minimal, and I'm about to drop off
>>>> the net for a few days.
>>>>
>>>> Thomas, any insight?
>>>
>>> Looking at flat_cpu_mask_to_apicid(), I don't see how 74def747bcd0
>>> can be correct:
>>>
>>> struct cpumask *effmsk =
>>> irq_data_get_effective_affinity_mask(irqdata); unsigned long
>>> cpu_mask = cpumask_bits(mask)[0] & APIC_ALL_CPUS;
>>>
>>> if (!cpu_mask)
>>> return -EINVAL;
>>> *apicid = (unsigned int)cpu_mask;
>>> cpumask_bits(effmsk)[0] = cpu_mask;
>>>
>>> Before that patch, this function wrote to the effective mask
>>> unconditionally. After, it only writes to effective_mask if it is
>>> already non-zero.
>>>
>>>
>>> http://news.gmane.org/find-root.php?message_id=20170919203044.560cb9f1%40gmail.com
>>> http://mid.gmane.org/20170919203044.560cb9f1%40gmail.com
>>>
>
Powered by blists - more mailing lists