lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20211213105728.GA21139@linux.intel.com>
Date:   Mon, 13 Dec 2021 18:57:29 +0800
From:   Carel Si <beibei.si@...el.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     kernel test robot <oliver.sang@...el.com>,
        Jann Horn <jannh@...gle.com>,
        Miklos Szeredi <mszeredi@...hat.com>,
        LKML <linux-kernel@...r.kernel.org>, lkp@...ts.01.org,
        kernel test robot <lkp@...el.com>, fengwei.yin@...el.com
Subject: Re: [LKP] Re: [fget] 054aa8d439: will-it-scale.per_thread_ops -5.7%
 regression

Hi Linus,

On Fri, Dec 10, 2021 at 10:33:43AM -0800, Linus Torvalds wrote:
> On Thu, Dec 9, 2021 at 9:38 PM kernel test robot <oliver.sang@...el.com> wrote:
> >
> > FYI, we noticed a -5.7% regression of will-it-scale.per_thread_ops due to commit:
> > 054aa8d439b9  ("fget: check that the fd still exists after getting a ref to it")
> 
> Well, some downside of the new checks was expected, that's just much
> more than I really like or would have thought.
> 
> But it's exactly where you'd expect:
> 
> >      27.16 ± 10%      +4.3       31.51 ±  2%  perf-profile.calltrace.cycles-pp.__fget_light.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >      22.91 ± 10%      +4.4       27.34 ±  2%  perf-profile.calltrace.cycles-pp.__fget_files.__fget_light.do_sys_poll.__x64_sys_poll.do_syscall_64
> >      26.33 ± 10%      +4.4       30.70 ±  2%  perf-profile.children.cycles-pp.__fget_light
> >      22.92 ± 10%      +4.4       27.35 ±  2%  perf-profile.children.cycles-pp.__fget_files
> >      22.70 ± 10%      +4.4       27.11 ±  2%  perf-profile.self.cycles-pp.__fget_files
> 
> although there's odd spikes in dTLB-loads etc.
> 
> I checked whether it's some unexpected code generation issue, but the
> new "re-check file table after refcount update" really looks very
> cheap when I look at what gcc generates, there's nothing really
> unexpected there.
> 
> What did change was:
> 
>  (a) some branches go other ways, which might well affect branch
> prediction and just be unlucky. It might be that just marking the
> mismatch case "unlikely()" will help.
> 
>  (b) the obvious few new instructions (re-load and check file table
> pointer, re-load and check file pointer)
> 
>  (c) that __fget_files() function is now no longer a leaf function in
> a simple config case, since it calls "fput_many" in the error case.
> 
> And that (c) is worth mentioning simply because it means that the
> function goes from not having any stack frame at all, to having to
> save/restore four registers. So now it has the usual push/pop
> sequences.
> 
> It may also be that the test-case actually does a lot of threaded
> open/close/poll, and either actually triggers the re-lookup looping
> case (unlikely) or just sees a lot of cacheline bouncing that now got
> worse due to the re-check of the file pointer.
> 
> So this regression looks real, and the issue seems to be that
> __fget_files() really is _that_ important for this do_sys_poll()
> benchmark, and even just the handful of extra instructions end up
> being meaningful.
> 
> Oliver - I'm attaching the obvious "unlikely9)" oneliner in case it's
> just "gcc thought the retry loop was the common case" issue and bad
> branch prediction.

We tested your patch, it didn't work, still has -6.0% regression, thanks.

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
  gcc-9/performance/x86_64-rhel-8.3/thread/50%/debian-10.4-x86_64-20200603.cgz/lkp-ivb-2ep1/poll2/will-it-scale/0x42e

commit:
  5f58da2bef ("Merge tag 'drm-fixes-2021-12-03-1' of git://anongit.freedesktop.org/drm/drm")
  054aa8d439 ("fget: check that the fd still exists after getting a ref to it")
  ef8c68873e ("fixup-for-054aa8d439")

5f58da2befa58edf 054aa8d439b9185d4f5eb9a9028 ef8c68873e75cf486bec22c3e8d 
---------------- --------------------------- --------------------------- 
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \  
   6666720            -5.7%    6288280            -6.0%    6268295        will-it-scale.24.threads
    277779            -5.7%     262011            -6.0%     261178        will-it-scale.per_thread_ops
   6666720            -5.7%    6288280            -6.0%    6268295        will-it-scale.workload
     28.74 ± 23%      -1.9       26.84 ±  6%      +1.2       29.92 ± 25%  perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry
     27.82 ± 23%      -1.9       25.93 ±  5%      +1.4       29.20 ± 26%  perf-profile.calltrace.cycles-pp.cpuidle_enter.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
     27.82 ± 23%      -1.9       25.93 ±  5%      +1.4       29.20 ± 26%  perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry.start_secondary
     27.83 ± 23%      -1.9       25.94 ±  5%      +1.4       29.20 ± 26%  perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
     27.83 ± 23%      -1.9       25.94 ±  5%      +1.4       29.20 ± 26%  perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
     27.83 ± 23%      -1.9       25.94 ±  5%      +1.4       29.20 ± 26%  perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64_no_verify
     28.81 ± 23%      -1.9       26.93 ±  6%      +1.2       29.98 ± 25%  perf-profile.calltrace.cycles-pp.secondary_startup_64_no_verify
     19.12 ±  9%      -1.7       17.39 ±  2%      -2.5       16.64 ± 11%  perf-profile.calltrace.cycles-pp.fput_many.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.26 ±100%      -0.2        0.10 ±223%      -0.1        0.17 ±141%  perf-profile.calltrace.cycles-pp.__kmalloc.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
      2.88 ±  9%      -0.1        2.76 ±  2%      -0.2        2.64 ± 11%  perf-profile.calltrace.cycles-pp.testcase
      3.27 ±  9%      -0.1        3.18 ±  2%      -0.2        3.04 ± 11%  perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__poll
      1.70 ±  9%      -0.1        1.63 ±  2%      -0.1        1.56 ± 11%  perf-profile.calltrace.cycles-pp.__fdget.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
      1.70 ± 10%      -0.1        1.64 ±  2%      -0.1        1.58 ± 10%  perf-profile.calltrace.cycles-pp.fput.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
      1.04 ±  8%      -0.1        0.98 ±  4%      -0.1        0.96 ± 11%  perf-profile.calltrace.cycles-pp.__check_object_size.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
      1.64 ±  9%      -0.1        1.59 ±  3%      -0.1        1.50 ± 10%  perf-profile.calltrace.cycles-pp.__entry_text_start.__poll
      1.69 ±  9%      -0.0        1.65 ±  3%      -0.1        1.57 ± 12%  perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.__poll
      1.11 ±  9%      -0.0        1.08 ±  4%      -0.1        1.02 ±  9%  perf-profile.calltrace.cycles-pp.copy_user_enhanced_fast_string._copy_from_user.do_sys_poll.__x64_sys_poll.do_syscall_64
      1.48 ±  8%      -0.0        1.45 ±  4%      -0.1        1.38 ± 10%  perf-profile.calltrace.cycles-pp._copy_from_user.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.94 ± 57%      -0.0        0.92 ± 54%      -0.2        0.74 ± 56%  perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_kernel.secondary_startup_64_no_verify
      0.94 ± 57%      -0.0        0.92 ± 54%      -0.2        0.74 ± 56%  perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_kernel.secondary_startup_64_no_verify
      0.94 ± 57%      -0.0        0.92 ± 54%      -0.2        0.74 ± 56%  perf-profile.calltrace.cycles-pp.cpuidle_enter.do_idle.cpu_startup_entry.start_kernel.secondary_startup_64_no_verify
      0.94 ± 57%      -0.0        0.92 ± 54%      -0.2        0.74 ± 56%  perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry.start_kernel
      0.94 ± 57%      -0.0        0.92 ± 54%      -0.2        0.74 ± 56%  perf-profile.calltrace.cycles-pp.start_kernel.secondary_startup_64_no_verify
      0.54 ± 45%      +0.1        0.59 ±  9%      +0.1        0.66 ±  6%  perf-profile.calltrace.cycles-pp.kfree.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
     67.66 ±  9%      +2.0       69.69 ±  2%      -0.9       66.78 ± 10%  perf-profile.calltrace.cycles-pp.__poll
     63.66 ±  9%      +2.2       65.82 ±  2%      -0.6       63.10 ± 10%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__poll
     63.48 ±  9%      +2.2       65.64 ±  2%      -0.6       62.93 ± 10%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__poll
     59.85 ±  9%      +2.3       62.12 ±  2%      -0.3       59.56 ± 10%  perf-profile.calltrace.cycles-pp.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe.__poll
     57.74 ±  9%      +2.4       60.09 ±  2%      -0.1       57.62 ± 10%  perf-profile.calltrace.cycles-pp.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe.__poll
     27.16 ± 10%      +4.3       31.51 ±  2%      +3.0       30.21 ± 10%  perf-profile.calltrace.cycles-pp.__fget_light.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
     22.91 ± 10%      +4.4       27.34 ±  2%      +3.3       26.25 ± 10%  perf-profile.calltrace.cycles-pp.__fget_files.__fget_light.do_sys_poll.__x64_sys_poll.do_syscall_64
     27.83 ± 23%      -1.9       25.94 ±  5%      +1.4       29.20 ± 26%  perf-profile.children.cycles-pp.start_secondary
     28.80 ± 23%      -1.9       26.92 ±  6%      +1.2       29.98 ± 25%  perf-profile.children.cycles-pp.cpuidle_enter
     28.80 ± 23%      -1.9       26.92 ±  6%      +1.2       29.98 ± 25%  perf-profile.children.cycles-pp.cpuidle_enter_state
     28.81 ± 23%      -1.9       26.93 ±  6%      +1.2       29.98 ± 25%  perf-profile.children.cycles-pp.secondary_startup_64_no_verify
     28.81 ± 23%      -1.9       26.93 ±  6%      +1.2       29.98 ± 25%  perf-profile.children.cycles-pp.cpu_startup_entry
     28.81 ± 23%      -1.9       26.93 ±  6%      +1.2       29.98 ± 25%  perf-profile.children.cycles-pp.do_idle
     28.78 ± 23%      -1.9       26.91 ±  6%      +1.2       29.96 ± 25%  perf-profile.children.cycles-pp.intel_idle
     18.28 ±  9%      -1.7       16.59 ±  2%      -2.4       15.87 ± 11%  perf-profile.children.cycles-pp.fput_many
      2.90 ±  9%      -0.1        2.76 ±  2%      -0.3        2.64 ± 11%  perf-profile.children.cycles-pp.testcase
      3.29 ±  9%      -0.1        3.20 ±  2%      -0.2        3.07 ± 11%  perf-profile.children.cycles-pp.syscall_exit_to_user_mode
      1.69 ±  9%      -0.1        1.63 ±  2%      -0.1        1.57 ± 10%  perf-profile.children.cycles-pp.__fdget
      1.70 ±  9%      -0.1        1.64 ±  2%      -0.1        1.57 ± 10%  perf-profile.children.cycles-pp.fput
      1.82 ±  9%      -0.1        1.76 ±  3%      -0.2        1.67 ± 11%  perf-profile.children.cycles-pp.__entry_text_start
      1.07 ±  9%      -0.1        1.02 ±  4%      -0.1        0.99 ± 10%  perf-profile.children.cycles-pp.__check_object_size
      1.90 ±  9%      -0.0        1.86 ±  2%      -0.1        1.77 ± 12%  perf-profile.children.cycles-pp.syscall_return_via_sysret
      1.50 ±  8%      -0.0        1.47 ±  4%      -0.1        1.39 ± 10%  perf-profile.children.cycles-pp._copy_from_user
      1.12 ±  9%      -0.0        1.09 ±  4%      -0.1        1.03 ± 10%  perf-profile.children.cycles-pp.copy_user_enhanced_fast_string
      0.30 ±  8%      -0.0        0.26 ± 11%      -0.0        0.29 ± 20%  perf-profile.children.cycles-pp.__virt_addr_valid
      0.12 ±  8%      -0.0        0.09 ± 11%      -0.0        0.09 ± 12%  perf-profile.children.cycles-pp.poll@plt
      0.04 ± 71%      -0.0        0.01 ±223%      -0.0        0.03 ±100%  perf-profile.children.cycles-pp.seq_read_iter
      0.17 ± 26%      -0.0        0.14 ± 32%      -0.0        0.14 ± 23%  perf-profile.children.cycles-pp.cmd_sched
      0.17 ± 26%      -0.0        0.14 ± 32%      -0.0        0.14 ± 23%  perf-profile.children.cycles-pp.cmd_record
      0.04 ± 71%      -0.0        0.02 ±144%      -0.0        0.03 ±100%  perf-profile.children.cycles-pp.ksys_read
      0.04 ± 71%      -0.0        0.02 ±144%      -0.0        0.03 ±100%  perf-profile.children.cycles-pp.vfs_read
      0.61 ± 13%      -0.0        0.59 ±  9%      +0.1        0.66 ±  5%  perf-profile.children.cycles-pp.kfree
      0.17 ± 26%      -0.0        0.15 ± 30%      -0.0        0.15 ± 21%  perf-profile.children.cycles-pp.__libc_start_main
      0.17 ± 26%      -0.0        0.15 ± 30%      -0.0        0.15 ± 21%  perf-profile.children.cycles-pp.main
      0.17 ± 26%      -0.0        0.15 ± 30%      -0.0        0.15 ± 21%  perf-profile.children.cycles-pp.run_builtin
      0.02 ±142%      -0.0        0.00            -0.0        0.00        perf-profile.children.cycles-pp.new_sync_write
      0.02 ±142%      -0.0        0.00            -0.0        0.01 ±223%  perf-profile.children.cycles-pp.ksys_write
      0.02 ±142%      -0.0        0.00            -0.0        0.01 ±223%  perf-profile.children.cycles-pp.vfs_write
      0.04 ± 45%      -0.0        0.02 ± 99%      -0.0        0.02 ± 99%  perf-profile.children.cycles-pp.__x86_indirect_thunk_rax
      0.13 ± 10%      -0.0        0.12 ±  6%      -0.0        0.12 ± 15%  perf-profile.children.cycles-pp.kmalloc_slab
      0.02 ± 99%      -0.0        0.01 ±223%      +0.0        0.02 ± 99%  perf-profile.children.cycles-pp.seq_read
      0.02 ± 99%      -0.0        0.01 ±223%      +0.0        0.02 ± 99%  perf-profile.children.cycles-pp.__libc_read
      0.15 ± 26%      -0.0        0.14 ± 31%      -0.0        0.13 ± 23%  perf-profile.children.cycles-pp.record__finish_output
      0.15 ± 26%      -0.0        0.14 ± 31%      -0.0        0.13 ± 23%  perf-profile.children.cycles-pp.perf_session__process_events
      0.06 ± 46%      -0.0        0.04 ± 73%      -0.0        0.04 ± 73%  perf-profile.children.cycles-pp.machines__deliver_event
      0.14 ± 27%      -0.0        0.12 ± 32%      -0.0        0.12 ± 19%  perf-profile.children.cycles-pp.process_simple
      0.09 ± 30%      -0.0        0.08 ± 34%      -0.0        0.08 ± 17%  perf-profile.children.cycles-pp.perf_session__process_user_event
      0.05 ± 47%      -0.0        0.04 ± 71%      -0.0        0.02 ±142%  perf-profile.children.cycles-pp.exc_page_fault
      0.10 ± 24%      -0.0        0.08 ± 30%      -0.0        0.08 ± 17%  perf-profile.children.cycles-pp.perf_session__deliver_event
      0.02 ±142%      -0.0        0.01 ±223%      -0.0        0.01 ±223%  perf-profile.children.cycles-pp.poll_select_set_timeout
      0.09 ± 24%      -0.0        0.08 ± 28%      -0.0        0.08 ± 19%  perf-profile.children.cycles-pp.__ordered_events__flush
      0.02 ±141%      -0.0        0.01 ±223%      -0.0        0.00        perf-profile.children.cycles-pp.proc_reg_read
      0.02 ±141%      -0.0        0.01 ±223%      -0.0        0.01 ±223%  perf-profile.children.cycles-pp.perf_output_sample
      0.01 ±223%      -0.0        0.00            +0.0        0.01 ±223%  perf-profile.children.cycles-pp.do_user_addr_fault
      0.04 ± 45%      -0.0        0.04 ± 71%      -0.0        0.00        perf-profile.children.cycles-pp.perf_callchain_user
      0.29 ±  9%      -0.0        0.28 ±  4%      -0.0        0.28 ± 10%  perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
      0.11 ± 12%      -0.0        0.10 ±  8%      -0.0        0.10 ± 10%  perf-profile.children.cycles-pp.__might_sleep
      0.05 ± 49%      -0.0        0.05 ± 52%      -0.0        0.04 ± 71%  perf-profile.children.cycles-pp.check_stack_object
      0.16 ± 20%      -0.0        0.16 ±  9%      -0.0        0.12 ± 11%  perf-profile.children.cycles-pp.perf_callchain_kernel
      0.12 ± 20%      -0.0        0.11 ±  8%      -0.0        0.10 ± 11%  perf-profile.children.cycles-pp.unwind_next_frame
      0.29 ± 11%      -0.0        0.28 ±  5%      -0.0        0.26 ±  9%  perf-profile.children.cycles-pp.__check_heap_object
      0.43 ± 19%      +0.0        0.43 ± 10%      -0.1        0.37 ±  9%  perf-profile.children.cycles-pp.update_process_times
      0.23 ± 20%      +0.0        0.23 ± 11%      -0.0        0.18 ±  9%  perf-profile.children.cycles-pp.perf_prepare_sample
      0.21 ± 20%      +0.0        0.21 ± 10%      -0.0        0.17 ± 10%  perf-profile.children.cycles-pp.perf_callchain
      0.02 ±142%      +0.0        0.02 ±142%      +0.0        0.02 ±142%  perf-profile.children.cycles-pp.io_serial_in
      0.02 ±141%      +0.0        0.02 ±141%      +0.0        0.02 ±141%  perf-profile.children.cycles-pp.ordered_events__queue
      0.07 ± 12%      +0.0        0.07 ±  6%      -0.0        0.07 ± 10%  perf-profile.children.cycles-pp.uart_console_write
      0.07 ± 12%      +0.0        0.08 ±  6%      -0.0        0.07 ± 10%  perf-profile.children.cycles-pp.serial8250_console_write
      0.21 ± 21%      +0.0        0.21 ± 10%      -0.0        0.17 ± 10%  perf-profile.children.cycles-pp.get_perf_callchain
      0.07 ± 12%      +0.0        0.07 ±  6%      -0.0        0.07 ± 10%  perf-profile.children.cycles-pp.wait_for_xmitr
      0.08 ± 12%      +0.0        0.08 ±  6%      -0.0        0.07 ± 12%  perf-profile.children.cycles-pp.vprintk_emit
      0.08 ± 12%      +0.0        0.08 ±  6%      -0.0        0.07 ± 12%  perf-profile.children.cycles-pp.console_unlock
      0.48 ± 17%      +0.0        0.48 ±  9%      -0.1        0.42 ± 10%  perf-profile.children.cycles-pp.__hrtimer_run_queues
      0.32 ± 21%      +0.0        0.32 ± 11%      -0.1        0.26 ± 11%  perf-profile.children.cycles-pp.update_curr
      0.61 ± 16%      +0.0        0.61 ±  7%      -0.1        0.55 ± 10%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
      0.30 ± 21%      +0.0        0.30 ± 10%      -0.1        0.24 ± 10%  perf-profile.children.cycles-pp.perf_trace_sched_stat_runtime
      0.28 ± 20%      +0.0        0.29 ± 10%      -0.1        0.23 ± 11%  perf-profile.children.cycles-pp.__perf_event_overflow
      0.28 ± 19%      +0.0        0.28 ± 11%      -0.1        0.23 ± 10%  perf-profile.children.cycles-pp.perf_event_output_forward
      0.29 ± 10%      +0.0        0.29 ±  9%      -0.0        0.28 ± 10%  perf-profile.children.cycles-pp.__might_fault
      0.08 ± 12%      +0.0        0.08 ± 22%      -0.0        0.07 ± 14%  perf-profile.children.cycles-pp.syscall_enter_from_user_mode
      0.05 ±  8%      +0.0        0.06 ± 13%      -0.0        0.04 ± 71%  perf-profile.children.cycles-pp.worker_thread
      0.05 ±  8%      +0.0        0.06 ± 13%      -0.0        0.04 ± 71%  perf-profile.children.cycles-pp.process_one_work
      0.05 ±  8%      +0.0        0.06 ± 13%      -0.0        0.04 ± 71%  perf-profile.children.cycles-pp.drm_fb_helper_damage_work
      0.05 ±  8%      +0.0        0.06 ± 13%      -0.0        0.04 ± 71%  perf-profile.children.cycles-pp.drm_atomic_helper_dirtyfb
      0.05 ±  8%      +0.0        0.06 ± 13%      -0.0        0.04 ± 71%  perf-profile.children.cycles-pp.drm_atomic_helper_commit
      0.05 ±  8%      +0.0        0.06 ± 13%      -0.0        0.04 ± 71%  perf-profile.children.cycles-pp.commit_tail
      0.05 ±  8%      +0.0        0.06 ± 13%      -0.0        0.04 ± 71%  perf-profile.children.cycles-pp.drm_atomic_helper_commit_tail
      0.05 ±  8%      +0.0        0.06 ± 13%      -0.0        0.04 ± 71%  perf-profile.children.cycles-pp.drm_atomic_helper_commit_planes
      0.05 ±  8%      +0.0        0.06 ± 13%      -0.0        0.04 ± 71%  perf-profile.children.cycles-pp.mgag200_simple_display_pipe_update
      0.05 ±  8%      +0.0        0.06 ± 13%      -0.0        0.04 ± 71%  perf-profile.children.cycles-pp.drm_fb_memcpy_dstclip
      0.05 ±  8%      +0.0        0.06 ± 13%      -0.0        0.04 ± 71%  perf-profile.children.cycles-pp.memcpy_toio
      0.07 ±  8%      +0.0        0.07 ±  6%      -0.0        0.07 ± 10%  perf-profile.children.cycles-pp.serial8250_console_putchar
      0.43 ± 19%      +0.0        0.44 ±  9%      -0.1        0.38 ±  9%  perf-profile.children.cycles-pp.tick_sched_handle
      0.57 ± 16%      +0.0        0.58 ±  7%      -0.1        0.52 ±  9%  perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
      0.05 ± 47%      +0.0        0.06 ±  8%      -0.0        0.04 ± 71%  perf-profile.children.cycles-pp.__unwind_start
      0.28 ± 20%      +0.0        0.29 ± 10%      -0.1        0.23 ± 11%  perf-profile.children.cycles-pp.perf_swevent_overflow
      0.07 ±  9%      +0.0        0.08 ±  6%      -0.0        0.07 ± 11%  perf-profile.children.cycles-pp.asm_sysvec_irq_work
      0.07 ±  9%      +0.0        0.08 ±  6%      -0.0        0.07 ± 11%  perf-profile.children.cycles-pp.sysvec_irq_work
      0.07 ±  9%      +0.0        0.08 ±  6%      -0.0        0.07 ± 11%  perf-profile.children.cycles-pp.__sysvec_irq_work
      0.07 ±  9%      +0.0        0.08 ±  6%      -0.0        0.07 ± 11%  perf-profile.children.cycles-pp.irq_work_run
      0.07 ±  9%      +0.0        0.08 ±  6%      -0.0        0.07 ± 11%  perf-profile.children.cycles-pp.irq_work_single
      0.07 ±  9%      +0.0        0.08 ±  6%      -0.0        0.07 ± 11%  perf-profile.children.cycles-pp._printk
      0.44 ± 19%      +0.0        0.45 ±  9%      -0.1        0.38 ±  9%  perf-profile.children.cycles-pp.tick_sched_timer
      0.54 ± 17%      +0.0        0.55 ±  8%      -0.1        0.49 ± 10%  perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
      0.54 ± 17%      +0.0        0.54 ±  8%      -0.1        0.48 ± 10%  perf-profile.children.cycles-pp.hrtimer_interrupt
      0.37 ± 19%      +0.0        0.37 ± 10%      -0.1        0.31 ±  9%  perf-profile.children.cycles-pp.task_tick_fair
      0.06 ± 47%      +0.0        0.07 ± 18%      -0.0        0.06 ± 21%  perf-profile.children.cycles-pp.asm_exc_page_fault
      0.07 ±  9%      +0.0        0.08 ±  8%      -0.0        0.07 ± 11%  perf-profile.children.cycles-pp.irq_work_run_list
      0.98 ± 47%      +0.0        0.99 ± 38%      -0.2        0.78 ± 45%  perf-profile.children.cycles-pp.start_kernel
      0.29 ± 21%      +0.0        0.30 ± 10%      -0.1        0.24 ± 10%  perf-profile.children.cycles-pp.perf_tp_event
      0.01 ±223%      +0.0        0.02 ±141%      +0.0        0.01 ±223%  perf-profile.children.cycles-pp.queue_event
      0.00            +0.0        0.01 ±223%      +0.0        0.01 ±223%  perf-profile.children.cycles-pp.build_id__mark_dso_hit
      0.01 ±223%      +0.0        0.02 ±141%      +0.0        0.02 ± 99%  perf-profile.children.cycles-pp.__cond_resched
      0.00            +0.0        0.01 ±223%      +0.0        0.02 ± 99%  perf-profile.children.cycles-pp.poll_freewait
      0.39 ± 19%      +0.0        0.40 ±  9%      -0.1        0.34 ±  9%  perf-profile.children.cycles-pp.scheduler_tick
      0.04 ± 71%      +0.0        0.04 ± 45%      -0.0        0.02 ±141%  perf-profile.children.cycles-pp.exit_to_user_mode_prepare
      0.06 ±  8%      +0.0        0.07 ± 11%      +0.0        0.06 ± 13%  perf-profile.children.cycles-pp.ret_from_fork
      0.06 ±  8%      +0.0        0.07 ± 11%      +0.0        0.06 ± 13%  perf-profile.children.cycles-pp.kthread
      0.19 ± 10%      +0.0        0.20 ±  9%      +0.0        0.19 ± 10%  perf-profile.children.cycles-pp.__might_resched
      0.02 ±141%      +0.0        0.03 ± 70%      -0.0        0.00        perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare
      0.01 ±223%      +0.0        0.03 ± 70%      -0.0        0.00        perf-profile.children.cycles-pp.__get_user_nocheck_8
      0.50 ± 12%      +0.0        0.53 ±  9%      -0.0        0.48 ± 11%  perf-profile.children.cycles-pp.__kmalloc
     67.89 ±  9%      +2.0       69.92 ±  2%      -0.9       67.00 ± 10%  perf-profile.children.cycles-pp.__poll
     63.84 ±  9%      +2.1       65.97 ±  2%      -0.6       63.26 ± 10%  perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     63.66 ±  9%      +2.1       65.80 ±  2%      -0.6       63.09 ± 10%  perf-profile.children.cycles-pp.do_syscall_64
     59.86 ±  9%      +2.3       62.12 ±  2%      -0.3       59.57 ± 10%  perf-profile.children.cycles-pp.__x64_sys_poll
     59.47 ±  9%      +2.3       61.75 ±  2%      -0.2       59.22 ± 10%  perf-profile.children.cycles-pp.do_sys_poll
     26.33 ± 10%      +4.4       30.70 ±  2%      +3.1       29.43 ± 10%  perf-profile.children.cycles-pp.__fget_light
     22.92 ± 10%      +4.4       27.35 ±  2%      +3.3       26.26 ± 10%  perf-profile.children.cycles-pp.__fget_files
     28.78 ± 23%      -1.9       26.91 ±  6%      +1.2       29.96 ± 25%  perf-profile.self.cycles-pp.intel_idle
     17.29 ±  9%      -1.6       15.64 ±  2%      -2.3       14.96 ± 11%  perf-profile.self.cycles-pp.fput_many
     11.06 ±  9%      -0.3       10.74 ±  2%      -0.8       10.29 ± 10%  perf-profile.self.cycles-pp.do_sys_poll
      2.87 ±  9%      -0.1        2.74 ±  2%      -0.2        2.62 ± 11%  perf-profile.self.cycles-pp.testcase
      3.21 ±  9%      -0.1        3.12 ±  2%      -0.2        3.00 ± 11%  perf-profile.self.cycles-pp.syscall_exit_to_user_mode
      1.60 ±  9%      -0.1        1.54 ±  4%      -0.1        1.47 ± 10%  perf-profile.self.cycles-pp.__entry_text_start
      1.90 ±  8%      -0.0        1.86 ±  3%      -0.1        1.76 ± 12%  perf-profile.self.cycles-pp.syscall_return_via_sysret
      2.55 ±  9%      -0.0        2.51 ±  4%      -0.2        2.36 ± 12%  perf-profile.self.cycles-pp.__fget_light
      0.84 ±  8%      -0.0        0.81 ±  2%      -0.1        0.78 ± 10%  perf-profile.self.cycles-pp.fput
      0.29 ±  8%      -0.0        0.26 ± 11%      -0.0        0.28 ± 20%  perf-profile.self.cycles-pp.__virt_addr_valid
      1.10 ±  9%      -0.0        1.07 ±  4%      -0.1        1.01 ± 10%  perf-profile.self.cycles-pp.copy_user_enhanced_fast_string
      0.28 ±  8%      -0.0        0.26 ±  5%      -0.0        0.26 ± 11%  perf-profile.self.cycles-pp.do_syscall_64
      0.84 ±  9%      -0.0        0.82 ±  3%      -0.0        0.79 ± 10%  perf-profile.self.cycles-pp.__fdget
      0.10 ±  6%      -0.0        0.08 ± 10%      -0.0        0.08 ± 14%  perf-profile.self.cycles-pp.poll@plt
      0.60 ± 12%      -0.0        0.59 ±  9%      +0.1        0.66 ±  6%  perf-profile.self.cycles-pp.kfree
      0.11 ± 12%      -0.0        0.09 ±  7%      -0.0        0.10 ± 13%  perf-profile.self.cycles-pp._copy_from_user
      0.44 ±  9%      -0.0        0.43 ±  3%      -0.0        0.40 ± 15%  perf-profile.self.cycles-pp.__check_object_size
      0.31 ± 10%      -0.0        0.30 ±  3%      -0.0        0.28 ± 15%  perf-profile.self.cycles-pp.__x64_sys_poll
      0.18 ±  8%      -0.0        0.17 ±  5%      -0.0        0.16 ± 11%  perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.02 ±142%      -0.0        0.01 ±223%      -0.0        0.01 ±223%  perf-profile.self.cycles-pp.poll_select_set_timeout
      0.04 ± 71%      -0.0        0.02 ± 99%      -0.0        0.02 ± 99%  perf-profile.self.cycles-pp.__x86_indirect_thunk_rax
      0.12 ± 10%      -0.0        0.12 ±  6%      -0.0        0.12 ± 17%  perf-profile.self.cycles-pp.kmalloc_slab
      0.01 ±223%      -0.0        0.00            -0.0        0.00        perf-profile.self.cycles-pp.syscall_exit_to_user_mode_prepare
      0.06 ± 11%      -0.0        0.05 ±  8%      -0.0        0.04 ± 71%  perf-profile.self.cycles-pp.syscall_enter_from_user_mode
      0.03 ±100%      -0.0        0.02 ±142%      -0.0        0.02 ±141%  perf-profile.self.cycles-pp.check_stack_object
      0.28 ± 11%      -0.0        0.27 ±  5%      -0.0        0.25 ± 11%  perf-profile.self.cycles-pp.__check_heap_object
      0.09 ± 13%      -0.0        0.09 ±  7%      -0.0        0.09 ± 10%  perf-profile.self.cycles-pp.__might_sleep
      0.08 ± 17%      -0.0        0.08 ± 25%      -0.0        0.08 ± 11%  perf-profile.self.cycles-pp.__might_fault
      0.03 ±100%      -0.0        0.02 ± 99%      -0.0        0.02 ±141%  perf-profile.self.cycles-pp.unwind_next_frame
      0.01 ±223%      +0.0        0.01 ±223%      -0.0        0.00        perf-profile.self.cycles-pp.exit_to_user_mode_prepare
      0.02 ±142%      +0.0        0.02 ±142%      +0.0        0.02 ±142%  perf-profile.self.cycles-pp.io_serial_in
      0.11 ±  9%      +0.0        0.12 ±  8%      -0.0        0.11 ± 11%  perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
      0.44 ±  8%      +0.0        0.45 ±  4%      -0.0        0.42 ± 12%  perf-profile.self.cycles-pp.__poll
      0.01 ±223%      +0.0        0.02 ±141%      +0.0        0.01 ±223%  perf-profile.self.cycles-pp.queue_event
      0.18 ± 11%      +0.0        0.19 ±  8%      -0.0        0.18 ± 10%  perf-profile.self.cycles-pp.__might_resched
      0.26 ± 13%      +0.0        0.30 ± 17%      -0.0        0.25 ± 13%  perf-profile.self.cycles-pp.__kmalloc
     22.70 ± 10%      +4.4       27.11 ±  2%      +3.3       26.04 ± 10%  perf-profile.self.cycles-pp.__fget_files

> 
> And it would perhaps be interesting to get an actual instruction-level
> profile of that __fget_files() thing for that benchmark, if that
> pinpoints exactly what is going on and in case that would be easy to
> get on that machine.
> 
> Because it might just be truly horrendously bad luck, with the 32-byte
> stack frame meaning that the kernel stack goes one more page down
> (just jhandwaving from the dTLB number spike), and this all being just
> random bad luck on that particular benchmark.
> 
> Of course, the thing about poll() is that for that case, we *don't*
> really need the "re-check the file descriptor" code at all, since the
> resulting fd isn't going to be installed as a new fd, and it doesn't
> matter for the socket garbage collector logic.
> 
> So maybe it was a mistake to put that re-check in the generic fdget()
> code - yes, it should be cheap, but it's also some of the most hot
> code in the kernel on some loads.
> 
> But if we move it elsewhere, we'd need to come up with some list of
> "these cases need it". Some are obvious: dup, dup2, unix domain file
> passing. It's the non-obvious ones I'd worry about.
> 
> Anybody?
> 
>               Linus

>  fs/file.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/file.c b/fs/file.c
> index ad4a8bf3cf10..f802360e240d 100644
> --- a/fs/file.c
> +++ b/fs/file.c
> @@ -858,7 +858,7 @@ static struct file *__fget_files(struct files_struct *files, unsigned int fd,
>  			file = NULL;
>  		else if (!get_file_rcu_many(file, refs))
>  			goto loop;
> -		else if (files_lookup_fd_raw(files, fd) != file) {
> +		else if (unlikely(files_lookup_fd_raw(files, fd) != file)) {
>  			fput_many(file, refs);
>  			goto loop;
>  		}

> _______________________________________________
> LKP mailing list -- lkp@...ts.01.org
> To unsubscribe send an email to lkp-leave@...ts.01.org

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ