[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACT4Y+YvJ-KT4UxtOXpou1xpT3wBASDkt135zr2BwrRi0gv4Tw@mail.gmail.com>
Date: Thu, 11 Oct 2018 15:27:41 +0200
From: Dmitry Vyukov <dvyukov@...gle.com>
To: Dominique Martinet <asmadeus@...ewreck.org>
Cc: Leon Romanovsky <leon@...nel.org>,
syzbot <syzbot+2222c34dc40b515f30dc@...kaller.appspotmail.com>,
David Miller <davem@...emloft.net>,
Eric Van Hensbergen <ericvh@...il.com>,
LKML <linux-kernel@...r.kernel.org>,
Latchesar Ionkov <lucho@...kov.net>,
netdev <netdev@...r.kernel.org>,
Ron Minnich <rminnich@...dia.gov>,
syzkaller-bugs <syzkaller-bugs@...glegroups.com>,
v9fs-developer@...ts.sourceforge.net
Subject: Re: BUG: corrupted list in p9_read_work
On Thu, Oct 11, 2018 at 3:10 PM, Dominique Martinet
<asmadeus@...ewreck.org> wrote:
> Dmitry Vyukov wrote on Thu, Oct 11, 2018:
>> > That's still the tricky part, I'm afraid... Making a separate server
>> > would have been easy because I could have reused some of my junk for the
>> > actual connection handling (some rdma helper library I wrote ages
>> > ago[1]), but if you're going to just embed C code you'll probably want
>> > something lower level? I've never seen syzkaller use any library call
>> > but I'm not even sure I would know how to create a qp without
>> > libibverbs, would standard stuff be OK ?
>>
>> Raw syscalls preferably.
>> What does 'rxe_cfg start ens3' do on syscall level? Some netlink?
>
> modprobe rdma_rxe (and a bunch of other rdma modules before that) then
> writes the interface name in /sys/module/rdma_rxe/parameters/add
> apparently; then checks it worked.
> this part could be done in C directly without too much trouble, but as
> long as the proper kernel configuration/modules are available
Now we are talking!
We generally assume that all modules are simply compiled into kernel.
At least that's we have on syzbot. If somebody can't compile them in,
we can suggest to add modprobe into init.
So this boils down to just writing to /sys/module/rdma_rxe/parameters/add.
>> Any libraries and utilities are hell pain in linux world. Will it work
>> in Android userspace? gVisor? Who will explain all syzkaller users
>> where they get this for their who-knows-what distro, which is 10 years
>> old because of corp policies, and debug how their version of the
>> library has a slightly incompatible version?
>> For example, after figuring out that rxe_cfg actually comes from
>> rdma-core (which is a separate delight on linux), my debian
>> destribution failed to install it because of some conflicts around
>> /etc/modprobe.d/mlx4.conf, and my ubuntu distro does not know about
>> such package. And we've just started :)
>
> The rdma ecosystem is a pain, I'll easily agree with that...
>
>> Syscalls tend to be simpler and more reliable. If it gives ENOSUPP,
>> ok, that's it. If it works, great, we can use it.
>
> I'll have to look into it a bit more; libibverbs abstracts a lot of
> stuff into per-nic userspace drivers (the files I cited in a previous
> mail) and basically with the mellanox cards I'm familiar with the whole
> user session looks like this:
> * common libibverbs/rdmacm code opens /dev/infiniband/rdma_cm and
> /dev/infiniband/uverbs0 (plus a bunch of files to figure out abi
> version, what user driver to load etc)
> * it and the userspace driver issue "commands" over these two files' fd
> to setup the connection ; some commands are standard but some are
> specific to the interface and defined in the driver.
But we will use some kind of virtual/stub driver, right? We don't have
real hardware. So all these commands should be fixed and known for the
virtual/stub driver.
> There are many facets to a connection in RDMA: a protection domain used
> to register memory with the nic, a queue pair that is the actual tx/rx
> connection, optionally a completion channel that will be another fd to
> listen on for events that tell you something happened and finally some
> memory regions to directly communicate with the nic from userspace
> depending on the specific driver.
> * then there's the actual usage, more commands through the uverbs0 char
> device to register the memory you'll use, and once that's done it's
> entierly up to the driver - for example the mellanox lib can do
> everything in userspace playing with the memory regions it registered,
> but I'd wager the rxe driver does more calls through the uverbs0 fd...
>
> Honestly I'm not keen on reimplementing all of this; the interface
> itself pretty much depends on your version of the kernel (there is a
> common ABI defined, but as far as specific nics are concerned if your
> kernel module doesn't match the user library version you can get some
> nasty surprises), and it's far from the black or white of a good ol'
> ENOSUPP error.
>
>
> I'll look if I can figure out if there is a common subset of verbs
> commands that are standard and sufficient to setup a listening
> connection and exchange data that should be supported for all devices
> and would let us reimplement just that, but while I hear your point
> about android and ten years in the future I think it's more likely than
> ten years in the future the verb abi will have changed but libibverbs
> will just have the new version implemented and hide the change :P
But again we don't need to support all of the available hardware.
For example, we are testing net stack from external side using tun.
tun is a very simple, virtual abstraction of a network card. It allows
us to test all of generic net stack starting from L2 without messing
with any real drivers and their differences entirely. I had impression
that we are talking about something similar here too. Or not?
Also I am a bit missing context about rdma<->9p interface. Do we need
to setup all these ring buffers to satisfy the parts that 9p needs? Is
it that 9p actually reads data directly from these ring buffers? Or
there is some higher-level rdma interface that 9p uses?
Powered by blists - more mailing lists