lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130316213512.2974.17303.stgit@amt.stowe>
Date:	Sat, 16 Mar 2013 15:35:12 -0600
From:	Myron Stowe <myron.stowe@...hat.com>
To:	kay@...y.org
Cc:	linux-hotplug@...r.kernel.org, greg@...ah.com,
	alex.williamson@...hat.com, linux-pci@...r.kernel.org,
	yuxiangl@...vell.com, yxlraid@...il.com,
	linux-kernel@...r.kernel.org
Subject: [PATCH] udevadm-info: Don't access sysfs entries backing device I/O
	port space


I've been working on identifying the root cause of an issue exposed by
'udevadm' that was first exposed on the linux-pci mail list [1] and
believe that there is now enough of an understanding to propose a fix.

What was originally witnessed was the platform hanging after "udevadm info
--attribute-walk --path=/sys/devices/pci0000:00/<...>/block/sda" is ran.
Xiangliang was able to isolate the failure to accesses involving a Marvell
9125 device's I/O BARs, or more specifically, accesses to the I/O port
space backing the device's I/O BARs (a.k.a. the device's I/O port
resources, or regions).  With this knowledge he was able to reproduce the
hang targeting the device's sysfs 'resource<N>' entries, where N
represents an I/O BARs, with "cat /sys/devices/<...>/resource<N>".

In my research, looking for possible solutions, I noticed that kernel
commit 8633328 introduced sysfs based reading and writing of I/O port
related 'resource<N>' entries as part of adding virtualization based
device assignment functionality [2].  Note that these regions directly map
to the device's control and status registers [3].

Putting together these pieces of information we now understand that:
  o  udevadm based attribute walking initiates read accesses of all the
     entries in a device's sysfs directory [4],
  o  sysfs 'resource<N>' entries correspond to the device's internal
     status and control registers used for driving the device,
  o  If the 'resource<N>' entry corresponds to a device's I/O BAR, the
     device's control and status registers are directly accessible by
     userspace.

Allowing userspace access to a device's registers introduces potential
simultaneous interaction with the device by a second, competing, entity;
There is the device's driver, which believes it exclusively owns the
device, and an unknown, potential second entity, which can effect control
and status changes to the device asynchronously.

Device status and control registers being accessed from an entity that has
no idea what is being read or written is just asking for trouble.  Even
just reading can have consequences as the register may be a "read once to
clear" or some similar type.  I think we have just been lucky, or
blissfully ignorant, concerning problems that may have, and still could
be, occurring due to this situation.


There is an aspect at play here that I still do not understand (likely
something obvious that I'm just not seeing).  The sysfs read routine for
accessing I/O port 'resource<N>' entries only supports 1, 2, and 4 byte
reads (which respectively correspond to inb/outb, inw/outw, and inl/outl
I/O port accessors).  When "udevadm ..." executes, the udev internals
attempt reads of 4K byte chunks.

  "udevadm info --attribute-walk --path=<pci_device_path>"

  print_device_chain()
    print_all_attributes()
      ...
      udev_device_get_sysattr_value()
        char value[4096];
        ...
        size = read(fd, value, sizeof(value));
        ...

  -- ^ userspace ^ -- v kernel v --

  pci_read_resource_io(..., count)	# sysfs read setup in pci_create_attr()
    pci_resource_io(..., count, ...)
      ...
      if (port + count - 1 > pci_resource_end(pdev, i))
        return -EINVAL;
      ...

What I don't understand is how the device's I/O port space is successfully
getting read.  It looks to me like 'pci_resource_io()' would fail the
access size check and return '-EINVAL' having never attempted the read's
access to I/O port space causing the system to hang.

I'm keep looking into this but I do *not* have access to a platform with a
Marvell 9125 device.


Reference(s)/Foot Note(s):
  [1] https://lkml.org/lkml/2013/3/7/242
  [2] Note that due to the implementation specifics only the 'resource<N>'
      entries representing I/O BARs can be read or written via sysfs.
      Sysfs' 'resource<N>' entries representing MMIO do not have sysfs
      based read/write routines as only mmap'ing of these entries is
      exposed (./drivers/pci/pci-sysfs.c::pci_create_attr()).
  [3] The kernel's sysfs documentation states: "Attributes should be ASCII
      text files..." (./Documentation/filesystems/sysfs.txt).  I wonder if
      this is just out-of-date infomation as sysfs obviously supports
      creating binary files (./fs/sysfs/bin.c::sysfs_create_bin_file()).
  [4] Note that udevadm-info does skip a specifically named set of entries
      (./src/udevadm-info.c::skip_attribute()).
---

Myron Stowe (1):
      udevadm-info: Don't access sysfs 'resource<N>' files


 src/udevadm-info.c |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletions(-)

-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ