lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aad1eb3a2e9c3b733a18284d8da933e3bbf3f491.camel@themaw.net>
Date:   Mon, 07 Jun 2021 18:31:24 +0800
From:   Ian Kent <raven@...maw.net>
To:     Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Tejun Heo <tj@...nel.org>
Cc:     Eric Sandeen <sandeen@...deen.net>, Fox Chen <foxhlchen@...il.com>,
        Brice Goglin <brice.goglin@...il.com>,
        Al Viro <viro@...IV.linux.org.uk>,
        Rick Lindsley <ricklind@...ux.vnet.ibm.com>,
        David Howells <dhowells@...hat.com>,
        Miklos Szeredi <miklos@...redi.hu>,
        Marcelo Tosatti <mtosatti@...hat.com>,
        "Eric W. Biederman" <ebiederm@...ssion.com>,
        Carlos Maiolino <cmaiolino@...hat.com>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>,
        Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v5 0/6] kernfs: proposed locking and concurrency
 improvement (the missing perf attachments)


And I thought, I should add these attachments first and sure enough ...

On Mon, 2021-06-07 at 18:24 +0800, Ian Kent wrote:
> On Mon, 2021-06-07 at 18:11 +0800, Ian Kent wrote:
> > There have been a few instances of contention on the kernfs_mutex
> > during
> > path walks, a case on very large IBM systems seen by myself, a
> > report
> > by
> > Brice Goglin and followed up by Fox Chen, and I've since seen a
> > couple
> > of other reports by CoreOS users.
> 
> The contention problems that I've seen show large numbers of
> processes
> blocking in the functions ->d_revalidate() and ->permission() when a
> lot of path walks are being done concurrently.
> 
> I've also seen contention on d_alloc_parallel() when there are a lot
> of
> path walks for a file path that doesn't exist. For this later case I
> saw a much smaller propotion of processes blocking but enough to get
> my attention.
> 
> I used Fox Chen's benchmark repo. (slightly modified) to perform
> tests
> to demonstrate the contention. The program basically runs a number of
> pthreads threads (equal to the number of cpus) and each thread opens,
> reads and then closes a file or files (depending on test case) in a
> loop, here, 10000 times
> 
> I performed two tests, one with distinct file paths for half the
> number
> of cpus, ie. 8 files for a 16 cpu machine, and another using the same
> non-existent file path to simulate many lookups for a path that
> doesn't
> exist.
> 
> The former test compares general path lookup behaviour between the
> kernfs mutex and the rwsem while the later compares lookup behaviour
> when VFS negative dentry caching is used.
> 
> I looked at the case of having only the negative dentry patches
> applied but it was uninteresting as the contention moved from
> d_alloc_parallel() to ->d_realidate() and ->permission(). So the
> rwsem patches are needed for the netgative dentry patches to be
> useful.
> 
> I captured perf call graphs for each of the test runs that show where
> contention is occuring. For the missing file case d_alloc_parallel()
> dominates the call graph and for the files case ->d_revalidate() and
> ->permission() dominate the call graph.
> 
> 1) Run with 8 distinct sysfs file paths on 16 cpu machine, perf
> graphs
> in base-files-cpu-16-perf.txt and patched-files-cpu-16-perf.txt.
> 
> Base (5.8.18-100.fc31, unpatched)
> ---------------------------------
> single: total 37.802725ms per 3.780272us
> concur: total 888.934870ms per 88.893487us  CPU 5
> concur: total 893.396079ms per 89.339608us  CPU 3
> concur: total 897.952652ms per 89.795265us  CPU 8
> concur: total 908.120647ms per 90.812065us  CPU 13
> concur: total 909.288507ms per 90.928851us  CPU 2
> concur: total 936.016942ms per 93.601694us  CPU 10
> concur: total 937.143611ms per 93.714361us  CPU 15
> concur: total 946.569582ms per 94.656958us  CPU 6
> concur: total 946.859803ms per 94.685980us  CPU 11
> concur: total 951.888699ms per 95.188870us  CPU 12
> concur: total 952.930595ms per 95.293059us  CPU 14
> concur: total 953.105172ms per 95.310517us  CPU 9
> concur: total 953.983792ms per 95.398379us  CPU 1
> concur: total 954.019331ms per 95.401933us  CPU 7
> concur: total 954.314661ms per 95.431466us  CPU 4
> concur: total 953.315950ms per 95.331595us  CPU 0
> times: 10000 threads: 16 cpus: 16
> 
> patched (5.12.2)
> ----------------
> single: total 44.351311ms per 4.435131us
> concur: total 205.454229ms per 20.545423us  CPU 10
> concur: total 206.481337ms per 20.648134us  CPU 2
> concur: total 209.061697ms per 20.906170us  CPU 9
> concur: total 209.081926ms per 20.908193us  CPU 7
> concur: total 209.813371ms per 20.981337us  CPU 15
> concur: total 210.762667ms per 21.076267us  CPU 8
> concur: total 211.073960ms per 21.107396us  CPU 5
> concur: total 211.788792ms per 21.178879us  CPU 3
> concur: total 212.029698ms per 21.202970us  CPU 14
> concur: total 212.951390ms per 21.295139us  CPU 6
> concur: total 212.994193ms per 21.299419us  CPU 0
> concur: total 205.059406ms per 20.505941us  CPU 1
> concur: total 214.761330ms per 21.476133us  CPU 4
> concur: total 210.278140ms per 21.027814us  CPU 13
> concur: total 215.120326ms per 21.512033us  CPU 12
> concur: total 215.308288ms per 21.530829us  CPU 11
> times: 10000 threads: 16 cpus: 16
> 
> 2) Run with a single sysfs file path on 16 cpu machine, perf graphs
> in
> base-missing-cpu-16-perf.txt and patched-missing-cpu-16-perf.txt.
> 
> Base (5.8.18-100.fc31, unpatched)
> ---------------------------------
> single: total 23.870708ms per 2.387071us
> concur: total 796.504874ms per 79.650487us  CPU 3
> concur: total 806.306131ms per 80.630613us  CPU 11
> concur: total 808.494954ms per 80.849495us  CPU 6
> concur: total 813.103969ms per 81.310397us  CPU 1
> concur: total 813.407996ms per 81.340800us  CPU 8
> concur: total 813.427143ms per 81.342714us  CPU 9
> concur: total 815.892622ms per 81.589262us  CPU 2
> concur: total 816.133378ms per 81.613338us  CPU 4
> concur: total 817.189601ms per 81.718960us  CPU 14
> concur: total 818.323855ms per 81.832386us  CPU 13
> concur: total 820.115479ms per 82.011548us  CPU 15
> concur: total 821.024798ms per 82.102480us  CPU 7
> concur: total 826.135994ms per 82.613599us  CPU 12
> concur: total 826.315963ms per 82.631596us  CPU 0
> concur: total 829.141106ms per 82.914111us  CPU 10
> concur: total 830.058310ms per 83.005831us  CPU 5
> times: 10000 threads: 16 cpus: 16
> 
> patched (5.12.2)
> ----------------
> single: total 21.414203ms per 2.141420us
> concur: total 231.474574ms per 23.147457us  CPU 15
> concur: total 233.030232ms per 23.303023us  CPU 11
> concur: total 235.226442ms per 23.522644us  CPU 5
> concur: total 236.084628ms per 23.608463us  CPU 9
> concur: total 236.635558ms per 23.663556us  CPU 10
> concur: total 237.156850ms per 23.715685us  CPU 2
> concur: total 237.260609ms per 23.726061us  CPU 3
> concur: total 237.577515ms per 23.757752us  CPU 12
> concur: total 237.605650ms per 23.760565us  CPU 1
> concur: total 237.746644ms per 23.774664us  CPU 8
> concur: total 238.417997ms per 23.841800us  CPU 0
> concur: total 238.725191ms per 23.872519us  CPU 4
> concur: total 240.301641ms per 24.030164us  CPU 14
> concur: total 240.570763ms per 24.057076us  CPU 13
> concur: total 240.758979ms per 24.075898us  CPU 6
> concur: total 241.211006ms per 24.121101us  CPU 7
> times: 10000 threads: 16 cpus: 16
> 
> 3) Run with 24 distinct sysfs file paths on 48 cpu machine, perf
> graphs
> in base-files-cpu-48-perf.txt and patched-files-cpu-48-perf.txt.
> 
> Base (5.12.2, unpatched)
> ------------------------
> single: total 122.827400ms per 12.282740us
> concur: total 5306.902134ms per 530.690213us  CPU 35
> concur: total 5630.720717ms per 563.072072us  CPU 46
> concur: total 5638.448405ms per 563.844841us  CPU 42
> concur: total 5642.860083ms per 564.286008us  CPU 34
> concur: total 5651.030648ms per 565.103065us  CPU 20
> concur: total 5657.526181ms per 565.752618us  CPU 31
> concur: total 5658.140447ms per 565.814045us  CPU 23
> concur: total 5659.691758ms per 565.969176us  CPU 19
> concur: total 5668.248013ms per 566.824801us  CPU 21
> concur: total 5669.774274ms per 566.977427us  CPU 22
> concur: total 5685.258360ms per 568.525836us  CPU 30
> concur: total 5685.799738ms per 568.579974us  CPU 32
> concur: total 5689.631849ms per 568.963185us  CPU 18
> concur: total 5696.818593ms per 569.681859us  CPU 44
> concur: total 5698.618608ms per 569.861861us  CPU 33
> concur: total 5698.794859ms per 569.879486us  CPU 45
> concur: total 5770.686184ms per 577.068618us  CPU 28
> concur: total 5778.892695ms per 577.889270us  CPU 27
> concur: total 5784.709119ms per 578.470912us  CPU 29
> concur: total 5788.893840ms per 578.889384us  CPU 24
> concur: total 5789.576181ms per 578.957618us  CPU 25
> concur: total 5798.722220ms per 579.872222us  CPU 26
> concur: total 5822.426684ms per 582.242668us  CPU 36
> concur: total 5826.460510ms per 582.646051us  CPU 38
> concur: total 5831.715090ms per 583.171509us  CPU 14
> concur: total 5831.966863ms per 583.196686us  CPU 41
> concur: total 5833.488179ms per 583.348818us  CPU 37
> concur: total 5835.039815ms per 583.503982us  CPU 40
> concur: total 5837.073842ms per 583.707384us  CPU 39
> concur: total 5838.603686ms per 583.860369us  CPU 16
> concur: total 5841.427760ms per 584.142776us  CPU 13
> concur: total 5844.173463ms per 584.417346us  CPU 17
> concur: total 5844.526500ms per 584.452650us  CPU 12
> concur: total 5844.543912ms per 584.454391us  CPU 15
> concur: total 5856.646296ms per 585.664630us  CPU 43
> concur: total 5882.959009ms per 588.295901us  CPU 4
> concur: total 5885.522053ms per 588.552205us  CPU 47
> concur: total 5886.485513ms per 588.648551us  CPU 9
> concur: total 5889.596333ms per 588.959633us  CPU 7
> concur: total 5891.098216ms per 589.109822us  CPU 8
> concur: total 5893.823953ms per 589.382395us  CPU 6
> concur: total 5894.175035ms per 589.417504us  CPU 10
> concur: total 5894.333983ms per 589.433398us  CPU 5
> concur: total 5894.339733ms per 589.433973us  CPU 11
> concur: total 5894.780552ms per 589.478055us  CPU 2
> concur: total 5894.902495ms per 589.490250us  CPU 3
> concur: total 5895.138875ms per 589.513888us  CPU 1
> concur: total 5895.751332ms per 589.575133us  CPU 0
> times: 10000 threads: 48 cpus: 48
> 
> patched (5.12.2)
> ----------------
> single: total 113.291645ms per 11.329165us
> concur: total 1593.959049ms per 159.395905us  CPU 13
> concur: total 1597.518495ms per 159.751850us  CPU 3
> concur: total 1597.658208ms per 159.765821us  CPU 6
> concur: total 1600.019094ms per 160.001909us  CPU 25
> concur: total 1601.089351ms per 160.108935us  CPU 23
> concur: total 1601.469009ms per 160.146901us  CPU 26
> concur: total 1602.896466ms per 160.289647us  CPU 30
> concur: total 1603.235130ms per 160.323513us  CPU 1
> concur: total 1603.366164ms per 160.336616us  CPU 28
> concur: total 1604.441214ms per 160.444121us  CPU 2
> concur: total 1604.688351ms per 160.468835us  CPU 36
> concur: total 1605.739458ms per 160.573946us  CPU 8
> concur: total 1606.069951ms per 160.606995us  CPU 31
> concur: total 1606.332397ms per 160.633240us  CPU 22
> concur: total 1608.634998ms per 160.863500us  CPU 11
> concur: total 1608.698868ms per 160.869887us  CPU 5
> concur: total 1609.072888ms per 160.907289us  CPU 43
> concur: total 1609.780952ms per 160.978095us  CPU 41
> concur: total 1610.214802ms per 161.021480us  CPU 12
> concur: total 1610.618660ms per 161.061866us  CPU 16
> concur: total 1610.885785ms per 161.088578us  CPU 27
> concur: total 1611.576231ms per 161.157623us  CPU 10
> concur: total 1612.083975ms per 161.208398us  CPU 38
> concur: total 1612.677333ms per 161.267733us  CPU 45
> concur: total 1612.698645ms per 161.269865us  CPU 44
> concur: total 1612.887981ms per 161.288798us  CPU 18
> concur: total 1612.808693ms per 161.280869us  CPU 4
> concur: total 1612.844263ms per 161.284426us  CPU 35
> concur: total 1612.760745ms per 161.276075us  CPU 40
> concur: total 1613.220738ms per 161.322074us  CPU 17
> concur: total 1613.249031ms per 161.324903us  CPU 29
> concur: total 1613.270812ms per 161.327081us  CPU 20
> concur: total 1613.325711ms per 161.332571us  CPU 24
> concur: total 1613.499246ms per 161.349925us  CPU 21
> concur: total 1613.347917ms per 161.334792us  CPU 42
> concur: total 1613.416651ms per 161.341665us  CPU 15
> concur: total 1613.742291ms per 161.374229us  CPU 46
> concur: total 1613.809087ms per 161.380909us  CPU 32
> concur: total 1613.329478ms per 161.332948us  CPU 19
> concur: total 1613.783009ms per 161.378301us  CPU 9
> concur: total 1613.626390ms per 161.362639us  CPU 39
> concur: total 1614.077897ms per 161.407790us  CPU 34
> concur: total 1614.094290ms per 161.409429us  CPU 7
> concur: total 1614.754743ms per 161.475474us  CPU 14
> concur: total 1614.958943ms per 161.495894us  CPU 0
> concur: total 1616.025304ms per 161.602530us  CPU 37
> concur: total 1617.808550ms per 161.780855us  CPU 47
> concur: total 1630.682246ms per 163.068225us  CPU 33
> times: 10000 threads: 48 cpus: 48
> 
> 4) Run with a single sysfs file path on 48 cpu machine, perf graphs
> in
> base-missing-cpu-48-perf.txt and patched-missing-cpu-48-perf.txt.
> 
> Base (5.12.2, unpatched)
> ------------------------
> single: total 87.107970ms per 8.710797us
> concur: total 15072.702249ms per 1507.270225us  CPU 24
> concur: total 15184.463418ms per 1518.446342us  CPU 26
> concur: total 15263.917735ms per 1526.391773us  CPU 28
> concur: total 15617.042833ms per 1561.704283us  CPU 25
> concur: total 15660.599769ms per 1566.059977us  CPU 27
> concur: total 16134.873816ms per 1613.487382us  CPU 29
> concur: total 16195.713672ms per 1619.571367us  CPU 11
> concur: total 17182.571407ms per 1718.257141us  CPU 10
> concur: total 17462.398666ms per 1746.239867us  CPU 9
> concur: total 17813.014094ms per 1781.301409us  CPU 8
> concur: total 18436.488514ms per 1843.648851us  CPU 6
> concur: total 18996.550399ms per 1899.655040us  CPU 7
> concur: total 21721.021674ms per 2172.102167us  CPU 41
> concur: total 21986.614285ms per 2198.661429us  CPU 17
> concur: total 22216.364478ms per 2221.636448us  CPU 23
> concur: total 22369.110429ms per 2236.911043us  CPU 5
> concur: total 22526.643861ms per 2252.664386us  CPU 35
> concur: total 22540.326825ms per 2254.032682us  CPU 40
> concur: total 22560.761109ms per 2256.076111us  CPU 30
> concur: total 22774.376673ms per 2277.437667us  CPU 33
> concur: total 22779.411375ms per 2277.941137us  CPU 31
> concur: total 22844.223722ms per 2284.422372us  CPU 16
> concur: total 22868.684174ms per 2286.868417us  CPU 34
> concur: total 22926.039600ms per 2292.603960us  CPU 32
> concur: total 22956.189714ms per 2295.618971us  CPU 38
> concur: total 23002.988812ms per 2300.298881us  CPU 22
> concur: total 23010.128228ms per 2301.012823us  CPU 36
> concur: total 23013.737650ms per 2301.373765us  CPU 4
> concur: total 23023.545614ms per 2302.354561us  CPU 39
> concur: total 23120.483176ms per 2312.048318us  CPU 15
> concur: total 23150.576516ms per 2315.057652us  CPU 37
> concur: total 23240.196530ms per 2324.019653us  CPU 14
> concur: total 23255.002167ms per 2325.500217us  CPU 21
> concur: total 23255.595018ms per 2325.559502us  CPU 0
> concur: total 23258.182221ms per 2325.818222us  CPU 3
> concur: total 23264.494553ms per 2326.449455us  CPU 12
> concur: total 23281.848036ms per 2328.184804us  CPU 13
> concur: total 23307.939070ms per 2330.793907us  CPU 47
> concur: total 23315.311150ms per 2331.531115us  CPU 46
> concur: total 23328.394731ms per 2332.839473us  CPU 2
> concur: total 23329.879007ms per 2332.987901us  CPU 20
> concur: total 23351.592451ms per 2335.159245us  CPU 19
> concur: total 23350.752868ms per 2335.075287us  CPU 1
> concur: total 23356.438116ms per 2335.643812us  CPU 45
> concur: total 23356.853217ms per 2335.685322us  CPU 42
> concur: total 23357.738390ms per 2335.773839us  CPU 44
> concur: total 23360.540952ms per 2336.054095us  CPU 43
> concur: total 23360.577828ms per 2336.057783us  CPU 18
> times: 10000 threads: 48 cpus: 48
> 
> patched (5.12.2)
> ----------------
> 
> single: total 115.004971ms per 11.500497us
> concur: total 1534.106287ms per 153.410629us  CPU 15
> concur: total 1584.741497ms per 158.474150us  CPU 34
> concur: total 1588.227774ms per 158.822777us  CPU 3
> concur: total 1590.944855ms per 159.094485us  CPU 27
> concur: total 1593.252406ms per 159.325241us  CPU 21
> concur: total 1594.347841ms per 159.434784us  CPU 44
> concur: total 1594.519690ms per 159.451969us  CPU 43
> concur: total 1594.651516ms per 159.465152us  CPU 11
> concur: total 1595.516558ms per 159.551656us  CPU 12
> concur: total 1596.826634ms per 159.682663us  CPU 22
> concur: total 1598.825527ms per 159.882553us  CPU 6
> concur: total 1598.914890ms per 159.891489us  CPU 33
> concur: total 1599.541434ms per 159.954143us  CPU 28
> concur: total 1600.537643ms per 160.053764us  CPU 38
> concur: total 1602.424304ms per 160.242430us  CPU 47
> concur: total 1602.725873ms per 160.272587us  CPU 30
> concur: total 1602.759128ms per 160.275913us  CPU 14
> concur: total 1603.849343ms per 160.384934us  CPU 29
> concur: total 1605.117369ms per 160.511737us  CPU 35
> concur: total 1605.473411ms per 160.547341us  CPU 19
> concur: total 1606.013413ms per 160.601341us  CPU 13
> concur: total 1606.068654ms per 160.606865us  CPU 4
> concur: total 1606.209860ms per 160.620986us  CPU 23
> concur: total 1606.923183ms per 160.692318us  CPU 1
> concur: total 1607.064867ms per 160.706487us  CPU 40
> concur: total 1607.121558ms per 160.712156us  CPU 20
> concur: total 1610.107603ms per 161.010760us  CPU 10
> concur: total 1610.140915ms per 161.014092us  CPU 5
> concur: total 1610.636352ms per 161.063635us  CPU 24
> concur: total 1612.699753ms per 161.269975us  CPU 46
> concur: total 1612.879734ms per 161.287973us  CPU 42
> concur: total 1613.176326ms per 161.317633us  CPU 2
> concur: total 1613.415669ms per 161.341567us  CPU 8
> concur: total 1613.811312ms per 161.381131us  CPU 25
> concur: total 1613.923411ms per 161.392341us  CPU 41
> concur: total 1613.966209ms per 161.396621us  CPU 31
> concur: total 1614.947228ms per 161.494723us  CPU 17
> concur: total 1615.337781ms per 161.533778us  CPU 37
> concur: total 1615.835025ms per 161.583502us  CPU 32
> concur: total 1615.982666ms per 161.598267us  CPU 39
> concur: total 1616.335216ms per 161.633522us  CPU 45
> concur: total 1616.340457ms per 161.634046us  CPU 36
> concur: total 1616.387235ms per 161.638723us  CPU 16
> concur: total 1617.248832ms per 161.724883us  CPU 9
> concur: total 1617.354503ms per 161.735450us  CPU 0
> concur: total 1617.455505ms per 161.745550us  CPU 18
> concur: total 1618.290721ms per 161.829072us  CPU 26
> concur: total 1630.338637ms per 163.033864us  CPU 7
> times: 10000 threads: 48 cpus: 48


View attachment "base-files-cpu-16-perf.txt" of type "text/plain" (144493 bytes)

View attachment "base-files-cpu-48-perf.txt" of type "text/plain" (404085 bytes)

View attachment "base-missing-cpu-16-perf.txt" of type "text/plain" (168787 bytes)

View attachment "base-missing-cpus-48-perf.txt" of type "text/plain" (465031 bytes)

View attachment "patched-files-cpu-16-perf.txt" of type "text/plain" (258699 bytes)

View attachment "patched-files-cpu-48-perf.txt" of type "text/plain" (362889 bytes)

View attachment "patched-missing-cpu-16-perf.txt" of type "text/plain" (171268 bytes)

View attachment "patched-missing-cpu-48-perf.txt" of type "text/plain" (306978 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ