linux-kernel - Re: [PATCH] lib/strscpy: remove word-at-a-time optimization.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <e6de0ec1-c59c-fea4-0335-4c5609e21656@prevas.dk>
Date:   Wed, 24 Jan 2018 09:54:09 +0100
From:   Rasmus Villemoes <rasmus.villemoes@...vas.dk>
To:     Andrey Ryabinin <aryabinin@...tuozzo.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>
CC:     <linux-kernel@...r.kernel.org>, Kees Cook <keescook@...omium.org>,
        Eryu Guan <eguan@...hat.com>,
        Alexander Potapenko <glider@...gle.com>,
        Chris Metcalf <metcalf@...m.mit.edu>,
        David Laight <David.Laight@...LAB.COM>,
        Dmitry Vyukov <dvyukov@...gle.com>, <stable@...r.kernel.org>
Subject: Re: [PATCH] lib/strscpy: remove word-at-a-time optimization.

On 2018-01-09 17:47, Andrey Ryabinin wrote:
> Attached user space program I used to see the difference.
> Usage:
> 	gcc -02 -o strscpy strscpy_test.c
> 	./strscpy {b|w} src_str_len count
> 
> src_str_len - length of source string in between 1-4096
> count - how many strscpy() to execute.
>  
> Also I've noticed something strange. I'm not sure why, but certain
> src_len values (e.g. 30) drives branch predictor crazy causing worse than usual results
> for byte-at-a-time copy:

I see something similar, but at the 30->31 transition, and the
branch-misses remain at 1-3% for higher values, until 42 where it drops
back to 0%. Anyway, I highly doubt we do a lot of string copies of
strings longer then 32.

$ perf stat ./strscpy_test b 30 10000000

 Performance counter stats for './strscpy_test b 30 10000000':

        156,777082      task-clock (msec)         #    0,999 CPUs
utilized
                 0      context-switches          #    0,000 K/sec

                 0      cpu-migrations            #    0,000 K/sec

                48      page-faults               #    0,306 K/sec

       584.646.177      cycles                    #    3,729 GHz

   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
     2.580.599.614      instructions              #    4,41  insns per
cycle
       660.114.283      branches                  # 4210,528 M/sec

             4.891      branch-misses             #    0,00% of all
branches

       0,156970910 seconds time elapsed

$ perf stat ./strscpy_test b 31 10000000

 Performance counter stats for './strscpy_test b 31 10000000':

        258,533250      task-clock (msec)         #    0,999 CPUs
utilized
                 0      context-switches          #    0,000 K/sec

                 0      cpu-migrations            #    0,000 K/sec

                50      page-faults               #    0,193 K/sec

       965.505.138      cycles                    #    3,735 GHz

   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
     2.660.773.463      instructions              #    2,76  insns per
cycle
       680.141.051      branches                  # 2630,768 M/sec

        19.150.367      branch-misses             #    2,82% of all
branches

       0,258725192 seconds time elapsed


Rasmus