This IPD describes a proposed new feature for illumos: a virtual GLDv3 network device through which a userland process can send and receive frames. Analogs for this device exist on other platforms: TAP (Linux, Solaris and some BSDs), feth (recent Darwins). Virtual devices of this nature are commonly used by virtual networking software such as OpenVPN, WireGuard or ZeroTier. The current working name of the device is feth for "faux ethernet". It is modelled in large part on the existing simnet device and shares a significant amount of code with simnet.
Current State
A working proof of concept is available and has been tested on SmartOS and OmniOS r50. Functionality works in the GZ, and with multiple devices present on a given system. If the apropriate privileges are granted, feth devices can be managed in a NGZ and are not visible outside the zone. A patch for ZeroTier (ZT) successfully permits an OmniOS node to join a ZT network, and simple tunnels can be written using, for example, python or node.
While at least partly functional, the code is not currently ready for integration. There are numerous cstyle compliance warnings and notes to the author, and significant bugs are likely to exist. The hope is that a lively discussion can occur in order to influence the design and "guide the code" toward its ultimate integration, should that be deemed worthwhile.
Description of use in practice
The link status of a feth interface is considered "UP" when the feth character device has been opened by a process and "DOWN" when no processes have the character device open. When the device is down, frames are dropped in order to prevent unnecessary loading of the STREAMS queues.
Components impacted
Management of feth devices
The following options will be added to dladm for managing feth devices. These correspond almost completely with those of the simnet subcommands, with the exception of the -s and -u options to create-feth.
create-feth [-t] [-s] [-R <root-dir>] [-u <mac address>] <feth-link>
-t: create a temporary device that will not survive system restart (temporary)
-u: specify a (unicast) mac address for the interface
-R: operate on a particular system root directory
delete-feth [-R <root-dir>] <feth-link>
-R: operate on a particular system root directory
show-feth [-p] [-o <field>,...] [-P] [<feth-link>]
-p: generate machine parsable output (must be used in conjunction with -o)
-o: specify the fields to include in output: link, macaddress
-P: show persistent devices, whether active or not
up-feth <feth-link>
Known open questions and issues
Promiscuous mode: some applications require promiscuous mode to be enabled. This can effectively be enabled within the driver, for example using an ioctl(), however the state of the interface can not be reported up. It appears that DLPI would need to be used by an application in order to propagate the promiscuous status. What's the best solution here?
Flow control: it seems logical that flow control on the character device should be implemented in order to prevent a slow character device reader from causing memory usage to balloon. The logical approach would be to use canputnext() and return the message blocks that aren't able to be passed along. However this seems to cause a precipitous drop (several orders of magnitude) in performance regardless of the settings of the device water marks or mechanism used to re-enable data transfer (ie, even immediately calling mac_tx_update()). Is the source of this behavior sub-optimal watermark settings between the character device and the stream head?
Driver "double-duty": the feth driver has two personalities: acting as a mac device or a character device depending on the node being operated on. Because a device driver can only supply 1 set of callback functions, it is up to the driver to decide which set of functionality to invoke. The current proof-of-concept code sets a magic cookie value at the beginning of the data passed to the device callbacks. The callbacks look for the presence of this value and invoke the mac callbacks when it is not present. This approach has worked in testing and it seems unlikely to cause a problem, as 1) the minimum length of the struct passed to the callback by mac is well over the length of the magic cookie and 2) it seems highly unlikely that a mac device's struct would inadvertently collide. However, it seems ugly and perhaps a better approach can be found.
Remaining Tasks
In addition to the open questions listed below, the following tasks have been identified as being critical to a successful effort:
Zone Hooks
Some additional work with respect to tooling will probably be required for an initial release: currently if a feth device is created in a zone, upon shutdown/restart of the zone, a fault occurs because the feth device still exists. On SmartOS, restarting a zone with a simnet or feth device causes a kernel panic due to a failed assertion. This work will most likely be distribution specific.
Device filesystem integration
In order to properly display devices associated with zones, an sdev plugin will need to be written so that the proper view of feth devices can be given depending on whether the context is the global zone or a non-global zone.
User niceties
By default, a read() from the feth character device will may return more or less than a full frame. Since other implementations explicitly indicate that a single full frame is returned per read(), we should set the RMSGN flag on the stream options flag.
What this proposal does not attempt to do
API compatibility
The TAP driver bundled with some illumos distributions is based on code written circa 2000 by Maxim Krasnyansky and subsequently enhanced by Kazuyoshi Aizawa. This driver has also been ported to Linux and BSD. This driver is not based on the GLDv3 (mac) framework. The driver does not include a utility for creating persistent interfaces, though a GPL licensed utility exists. A brief review of code suggests that the details of interacting with the driver is different in each environment. For example: on illumos, the device created is configured by pushing a series of STREAMS modules. Some applications may require using DLPI to configure the interface for things such as promiscuous mode.
The proposed API will be much closer to the Linux and BSD interfaces in terms of frame interchange. The configuration and teardown of feth interface will be different from both the existing TAP driver and that of the driver on other platforms. Overall, however, creating and using a feth device should be greatly simplified compared to the current situation, and and this proposal attempts to hew as closely to "illumos style" as practicable.
"Single-shot" mode
The existing TAP driver operates as a clone device. An inherent feature of this is that unless the STREAMS configuration is permanently linked, it will be destroyed automatically if the major device (/dev/tap) is closed. This is a useful behavior should, for example, a program creating a TAP device abort without a clean shutdown. In practice, vanity device names should make this a less critical omission, as each feth would presumably have a descriptive name indicating where it originated. Additionaly, subsequent executions of the program that created the feth would most likely result in a no-op with respect to interface creation.
The proposal seeks to use DLS (data-link services), and it does not appear that a clean interface destruction from the driver side is possible: the device can be destroyed but DLS will not be aware of this fact.
TUN Driver
The functionality provided by the proposed replacement for the TAP driver is frequently discussed in conjuction with that of the TUN driver, which is essentially an analogue of the TAP driver operating strictly in terms of IP. That is, a TUN device exchanges layer 4 IP protocol data with the userland process rather than layer 3 ethernet mac data.
The TUN driver provided provided with some illumos distributions is generated from the TAP driver source by switching a small amount of code using preprocessor defines.
Providing a replacement for the TUN driver is not currently in-scope for this proposal. Should analysis reveal that a replacement for the TUN portion of the TAP/TUN driver is desirable by following the TAP/TUN approach, this proposal could provide the bulk of the code required. It should be noted that there currently exists an IP tunnel driver within illumos that behaves in a different way and solves a different problem. At a minimum, this name clash would need to be addressed.
WIP Code