lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZMGddjINDt10BSvf@kbusch-mbp.dhcp.thefacebook.com>
Date:   Wed, 26 Jul 2023 16:25:58 -0600
From:   Keith Busch <kbusch@...nel.org>
To:     Pratyush Yadav <ptyadav@...zon.de>
Cc:     Christoph Hellwig <hch@....de>, Sagi Grimberg <sagi@...mberg.me>,
        Jens Axboe <axboe@...nel.dk>, linux-nvme@...ts.infradead.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] nvme-pci: do not set the NUMA node of device if it has
 none

On Wed, Jul 26, 2023 at 09:32:33PM +0200, Pratyush Yadav wrote:
> On Wed, Jul 26 2023, Keith Busch wrote:
> > Could you send the output of:
> >
> >   numactl --hardware
> 
> $ numactl --hardware
> available: 2 nodes (0-1)
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
> node 0 size: 245847 MB
> node 0 free: 245211 MB
> node 1 cpus: 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
> node 1 size: 245932 MB
> node 1 free: 245328 MB
> node distances:
> node   0   1
>   0:  10  21
>   1:  21  10
> 
> >
> > and then with and without your patch:
> >
> >   for i in $(cat /proc/interrupts | grep nvme0 | sed "s/^ *//g" | cut -d":" -f 1); do \
> >     cat /proc/irq/$i/{smp,effective}_affinity_list; \
> >   done
> 
> Without my patch:
> 
>     $   for i in $(cat /proc/interrupts | grep nvme0 | sed "s/^ *//g" | cut -d":" -f 1); do \
>     >     cat /proc/irq/$i/{smp,effective}_affinity_list; \
>     >   done

Hm, I wonder if there's something wrong with my script. All the cpu's
should be accounted for in the smp_affinity_list, assuming it captured
all the vectors of the nvme device, but both examples are missing half
the CPUs. It looks like you have 32 vectors. Does that sound right?

This does show the effective affinity is indeed always on node 0 without
your patch. I don't see why, though: the "group_cpus_evenly()" function
that spreads the interrupts doesn't know anything about the device the
resource is being grouped for, so it shouldn't even take its NUMA node
into consideration. It's just supposed to ensure all CPUs have a shared
resource, preferring to not share across numa nodes.

I'll emulate a similar CPU topology with similar nvme vector count and
see if I can find anything suspicious. I'm a little concerned we may
have the same problem for devices that have an associated NUMA node that
your patch isn't addressing.

>     41
>     40
>     33
>     33
>     44
>     44
>     9
>     9
>     32
>     32
>     2
>     2
>     6
>     6
>     11
>     11
>     1
>     1
>     35
>     35
>     39
>     39
>     13
>     13
>     42
>     42
>     46
>     46
>     41
>     41
>     46
>     46
>     15
>     15
>     5
>     5
>     43
>     43
>     0
>     0
>     14
>     14
>     8
>     8
>     12
>     12
>     7
>     7
>     10
>     10
>     47
>     47
>     38
>     38
>     36
>     36
>     3
>     3
>     34
>     34
>     45
>     45
>     5
>     5
> 
> With my patch:
> 
>     $   for i in $(cat /proc/interrupts | grep nvme0 | sed "s/^ *//g" | cut -d":" -f 1); do \
>     >     cat /proc/irq/$i/{smp,effective}_affinity_list; \
>     >   done
>     9
>     9
>     15
>     15
>     5
>     5
>     23
>     23
>     38
>     38
>     52
>     52
>     21
>     21
>     36
>     36
>     13
>     13
>     56
>     56
>     44
>     44
>     42
>     42
>     31
>     31
>     48
>     48
>     5
>     5
>     3
>     3
>     1
>     1
>     11
>     11
>     28
>     28
>     18
>     18
>     34
>     34
>     29
>     29
>     58
>     58
>     46
>     46
>     54
>     54
>     59
>     59
>     32
>     32
>     7
>     7
>     56
>     56
>     62
>     62
>     49
>     49
>     57
>     57

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ