lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250117163108.3f817d4d.alex.williamson@redhat.com>
Date: Fri, 17 Jan 2025 16:31:08 -0500
From: Alex Williamson <alex.williamson@...hat.com>
To: Ankit Agrawal <ankita@...dia.com>
Cc: Jason Gunthorpe <jgg@...dia.com>, Yishai Hadas <yishaih@...dia.com>,
 "shameerali.kolothum.thodi@...wei.com"
 <shameerali.kolothum.thodi@...wei.com>, "kevin.tian@...el.com"
 <kevin.tian@...el.com>, Zhi Wang <zhiw@...dia.com>, Aniket Agashe
 <aniketa@...dia.com>, Neo Jia <cjia@...dia.com>, Kirti Wankhede
 <kwankhede@...dia.com>, "Tarun Gupta (SW-GPU)" <targupta@...dia.com>,
 Vikram Sethi <vsethi@...dia.com>, Andy Currid <acurrid@...dia.com>,
 Alistair Popple <apopple@...dia.com>, John Hubbard <jhubbard@...dia.com>,
 Dan Williams <danw@...dia.com>, "Anuj Aggarwal (SW-GPU)"
 <anuaggarwal@...dia.com>, Matt Ochs <mochs@...dia.com>,
 "kvm@...r.kernel.org" <kvm@...r.kernel.org>, "linux-kernel@...r.kernel.org"
 <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v3 3/3] vfio/nvgrace-gpu: Check the HBM training and C2C
 link status

On Fri, 17 Jan 2025 21:13:52 +0000
Ankit Agrawal <ankita@...dia.com> wrote:

> >> > We're accessing device memory here but afaict the memory enable bit of
> >> > the command register is in an indeterminate state.  What happens if you
> >> > use setpci to clear the memory enable bit or 'echo 0 > enable' before
> >> > binding the driver?  Thanks,
> >> >
> >> > Alex  
> >>
> >> Hi Alex, sorry I didn't understand how we are accessing device memory here if
> >> the C2C_LINK_BAR0_OFFSET and HBM_TRAINING_BAR0_OFFSET are BAR0 regs.
> >> But anyways, I tried 'echo 0 > <sysfs_path>/enable' before device bind. I am not
> >> observing any issue and the bind goes through.
> >>
> >> Or am I missing something?  
> >
> > BAR0 is what I'm referring to as device memory.  We cannot access
> > registers in BAR0 unless the memory space enable bit of the command
> > register is set.  The nvgrace-gpu driver makes no effort to enable this
> > and I don't think the PCI core does before probe either.  Disabling
> > through sysfs will only disable if it was previously enabled, so
> > possibly that test was invalid.  Please try with setpci:
> >
> > # Read command register
> > $ setpci -s xxxx:xx:xx.x COMMAND
> > # Clear memory enable
> > $ setpci -s xxxx:xx:xx.x COMMAND=0:2
> > # Re-read command register
> > $ setpci -s xxxx:xx:xx.x COMMAND
> >
> > Probe driver here now that the memory enable bit should re--back as
> > unset.  Thanks,
> >
> > Alex  
> 
> Ok, yeah. I tried to disable through setpci, and the probe is failing with ETIME.
> Should we check if disabled and return -EIO for such situation to differentiate
> from timeout?

No, the driver needs to enable memory on the device around the iomap
rather than assuming the initial state.  Thanks,

Alex


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ