[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACT4Y+bM4DxysCzhWvhURgBVpnLR4uu+oxhLpdZ3x92cKwtAzA@mail.gmail.com>
Date: Thu, 11 Oct 2018 15:40:08 +0200
From: Dmitry Vyukov <dvyukov@...gle.com>
To: Dominique Martinet <asmadeus@...ewreck.org>
Cc: Leon Romanovsky <leon@...nel.org>,
syzbot <syzbot+2222c34dc40b515f30dc@...kaller.appspotmail.com>,
David Miller <davem@...emloft.net>,
Eric Van Hensbergen <ericvh@...il.com>,
LKML <linux-kernel@...r.kernel.org>,
Latchesar Ionkov <lucho@...kov.net>,
netdev <netdev@...r.kernel.org>,
Ron Minnich <rminnich@...dia.gov>,
syzkaller-bugs <syzkaller-bugs@...glegroups.com>,
v9fs-developer@...ts.sourceforge.net
Subject: Re: BUG: corrupted list in p9_read_work
On Thu, Oct 11, 2018 at 3:27 PM, Dmitry Vyukov <dvyukov@...gle.com> wrote:
> On Thu, Oct 11, 2018 at 3:10 PM, Dominique Martinet
> <asmadeus@...ewreck.org> wrote:
>> Dmitry Vyukov wrote on Thu, Oct 11, 2018:
>>> > That's still the tricky part, I'm afraid... Making a separate server
>>> > would have been easy because I could have reused some of my junk for the
>>> > actual connection handling (some rdma helper library I wrote ages
>>> > ago[1]), but if you're going to just embed C code you'll probably want
>>> > something lower level? I've never seen syzkaller use any library call
>>> > but I'm not even sure I would know how to create a qp without
>>> > libibverbs, would standard stuff be OK ?
>>>
>>> Raw syscalls preferably.
>>> What does 'rxe_cfg start ens3' do on syscall level? Some netlink?
>>
>> modprobe rdma_rxe (and a bunch of other rdma modules before that) then
>> writes the interface name in /sys/module/rdma_rxe/parameters/add
>> apparently; then checks it worked.
>> this part could be done in C directly without too much trouble, but as
>> long as the proper kernel configuration/modules are available
>
> Now we are talking!
> We generally assume that all modules are simply compiled into kernel.
> At least that's we have on syzbot. If somebody can't compile them in,
> we can suggest to add modprobe into init.
> So this boils down to just writing to /sys/module/rdma_rxe/parameters/add.
This fails for me:
root@...kaller:~# echo -n syz1 > /sys/module/rdma_rxe/parameters/add
[20992.905406] rdma_rxe: interface syz1 not found
bash: echo: write error: Invalid argument
>>> Any libraries and utilities are hell pain in linux world. Will it work
>>> in Android userspace? gVisor? Who will explain all syzkaller users
>>> where they get this for their who-knows-what distro, which is 10 years
>>> old because of corp policies, and debug how their version of the
>>> library has a slightly incompatible version?
>>> For example, after figuring out that rxe_cfg actually comes from
>>> rdma-core (which is a separate delight on linux), my debian
>>> destribution failed to install it because of some conflicts around
>>> /etc/modprobe.d/mlx4.conf, and my ubuntu distro does not know about
>>> such package. And we've just started :)
>>
>> The rdma ecosystem is a pain, I'll easily agree with that...
>>
>>> Syscalls tend to be simpler and more reliable. If it gives ENOSUPP,
>>> ok, that's it. If it works, great, we can use it.
>>
>> I'll have to look into it a bit more; libibverbs abstracts a lot of
>> stuff into per-nic userspace drivers (the files I cited in a previous
>> mail) and basically with the mellanox cards I'm familiar with the whole
>> user session looks like this:
>> * common libibverbs/rdmacm code opens /dev/infiniband/rdma_cm and
>> /dev/infiniband/uverbs0 (plus a bunch of files to figure out abi
>> version, what user driver to load etc)
>> * it and the userspace driver issue "commands" over these two files' fd
>> to setup the connection ; some commands are standard but some are
>> specific to the interface and defined in the driver.
>
> But we will use some kind of virtual/stub driver, right? We don't have
> real hardware. So all these commands should be fixed and known for the
> virtual/stub driver.
>
>> There are many facets to a connection in RDMA: a protection domain used
>> to register memory with the nic, a queue pair that is the actual tx/rx
>> connection, optionally a completion channel that will be another fd to
>> listen on for events that tell you something happened and finally some
>> memory regions to directly communicate with the nic from userspace
>> depending on the specific driver.
>> * then there's the actual usage, more commands through the uverbs0 char
>> device to register the memory you'll use, and once that's done it's
>> entierly up to the driver - for example the mellanox lib can do
>> everything in userspace playing with the memory regions it registered,
>> but I'd wager the rxe driver does more calls through the uverbs0 fd...
>>
>> Honestly I'm not keen on reimplementing all of this; the interface
>> itself pretty much depends on your version of the kernel (there is a
>> common ABI defined, but as far as specific nics are concerned if your
>> kernel module doesn't match the user library version you can get some
>> nasty surprises), and it's far from the black or white of a good ol'
>> ENOSUPP error.
>>
>>
>> I'll look if I can figure out if there is a common subset of verbs
>> commands that are standard and sufficient to setup a listening
>> connection and exchange data that should be supported for all devices
>> and would let us reimplement just that, but while I hear your point
>> about android and ten years in the future I think it's more likely than
>> ten years in the future the verb abi will have changed but libibverbs
>> will just have the new version implemented and hide the change :P
>
> But again we don't need to support all of the available hardware.
> For example, we are testing net stack from external side using tun.
> tun is a very simple, virtual abstraction of a network card. It allows
> us to test all of generic net stack starting from L2 without messing
> with any real drivers and their differences entirely. I had impression
> that we are talking about something similar here too. Or not?
>
> Also I am a bit missing context about rdma<->9p interface. Do we need
> to setup all these ring buffers to satisfy the parts that 9p needs? Is
> it that 9p actually reads data directly from these ring buffers? Or
> there is some higher-level rdma interface that 9p uses?
Powered by blists - more mailing lists