lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZMQYURrKPqIyTkG7@kbusch-mbp.dhcp.thefacebook.com>
Date:   Fri, 28 Jul 2023 13:34:41 -0600
From:   Keith Busch <kbusch@...nel.org>
To:     Pratyush Yadav <ptyadav@...zon.de>
Cc:     Christoph Hellwig <hch@....de>, Sagi Grimberg <sagi@...mberg.me>,
        Jens Axboe <axboe@...nel.dk>, linux-nvme@...ts.infradead.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] nvme-pci: do not set the NUMA node of device if it has
 none

On Fri, Jul 28, 2023 at 08:09:32PM +0200, Pratyush Yadav wrote:
> 
> I am guessing you are looking at irq_create_affinity_masks(). Yeah, It
> does not take into account the NUMA information. In fact, even if it
> did, the NUMA node associated with the IRQ is NUMA_NO_NODE
> (/proc/$irq/node == -1).
> 
> I did some more digging over the week to figure out what is going on. It
> seems like the kernel _does_ in fact allow all CPUs in the affinity. I
> added some prints in set_affinity_irq() in
> drivers/xen/events/events_base.c (since that is the irqchip for the
> interrupt). I see it being called with mask: ffffffff,ffffffff.
> 
> But I later see the function being called again with a different mask:
> 00000000,00008000. The stack trace shows the call is coming from
> ksys_write(). The process doing the write is irqbalance.
> 
> So I think your earlier statement was incorrect. irqbalance does in fact
> balance these interrupts and it probably looks at the NUMA information
> of the device to make that decision. My original reasoning holds and
> irqbalance is the one picking the affinity.
> 
> With this explanation, do you think the patch is good to go?

irqbalance still writes to the /proc/<irq>/smp_affinity to change it,
right? That's just getting I/O errors on my machines because it fails
irq_can_set_affinity_usr() for nvme's kernel managed interrupts (except
the first vector, but that one is not used for I/O). Is there another
path irqbalance is using that's somehow getting past the appropriate
checks? Or perhaps is your xen irq_chip somehow bypassing the managed
irq property?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ