lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87h9arz6cw.fsf@yhuang-mobile.sh.intel.com>
Date:	Thu, 11 Aug 2016 08:13:19 -0700
From:	"Huang\, Ying" <ying.huang@...el.com>
To:	Aaron Lu <aaron.lu@...el.com>
Cc:	<x86@...nel.org>, <linux-kernel@...r.kernel.org>,
	Alex Shi <alex.shi@...aro.org>,
	Tomoki Sekiyama <tomoki.sekiyama.qu@...achi.com>,
	Davidlohr Bueso <dave@...olabs.net>,
	"Huang Ying" <ying.huang@...el.com>,
	Ingo Molnar <mingo@...hat.com>,
	"Thomas Gleixner" <tglx@...utronix.de>,
	"\"H. Peter Anvin\"" <hpa@...or.com>
Subject: Re: [PATCH] x86/irq: do not substract irq_tlb_count from irq_call_count

Aaron Lu <aaron.lu@...el.com> writes:

> Since commit 52aec3308db8 ("x86/tlb: replace INVALIDATE_TLB_VECTOR by
> CALL_FUNCTION_VECTOR"), the tlb remote shootdown is done through call
> function vector. That commit didn't take care of irq_tlb_count so later
> commit fd0f5869724f ("x86: Distinguish TLB shootdown interrupts from
> other functions call interrupts") tried to fix it.
>
> The fix assumes every increase of irq_tlb_count has a corresponding
> increase of irq_call_count. So the irq_call_count is always bigger than
> irq_tlb_count and we could substract irq_tlb_count from irq_call_count.
>
> Unfortunately this is not true for the smp_call_function_single case.
> The IPI is only sent if the target CPU's call_single_queue is empty when
> adding a csd into it in generic_exec_single. That means if two threads
> are both adding flush tlb csds to the same CPU's call_single_queue, only
> one IPI is sent. In other words, the irq_call_count is incremented by 1
> but irq_tlb_count is incremented by 2. Over time, irq_tlb_count will be
> bigger than irq_call_count and the substract will produce a very large
> irq_call_count value due to overflow.
>
> Considering that:
> 1 it's not worth to send more IPIs for the sake of accurate counting of
>   irq_call_count in generic_exec_single;
> 2 it's not easy to tell if the call function interrupt is for TLB
>   shootdown in __smp_call_function_single_interrupt.
> Not to exclude TLB shootdown from call function count seems to be the
> simplest fix and this patch just did that.
>
> This is found by LKP's cyclic performance regression tracking recently
> with the vm-scalability test suite. I have bisected to commit
> 0a7ce4b5a632 ("mm/rmap: share the i_mmap_rwsem"). This commit didn't do
> anything wrong but revealed the irq_call_count problem. IIUC, the commit
> makes rwc->remap_one in rmap_walk_file concurrent with multiple threads.
> When remap_one is try_to_unmap_one, then multiple threads could queue
> flush tlb to the same CPU but only one IPI will be sent.
>
> Since the commit enter Linux v3.19, the counting problem only shows up
> from v3.19. Considering this is a behaviour change, I'm not sure if I
> should add the stable tag here.
>
> Signed-off-by: Aaron Lu <aaron.lu@...el.com>

Thanks for fix.  You forget to add :)

Reported-by: "Huang, Ying" <ying.huang@...el.com>

Best Regards,
Huang, Ying

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ