A new filesystem for Clouds using Device Mapping

Remote DMA (Direct Memory Access) provides a fast link-layer transfer in clusters, allow large data transfers to bypass the network and transport layer as well as the user-kernel space boundary. But in combination with modern OS design, these capabilities allow something else. Since interrupts are memory mapped and virtualizable between nearly arbitrary devices, devices such as network cards, disk drives and GPUs can be mapped into remote computers as if they were locally attached hot-pluggable devices.

In this thesis, you will explore whether this mapping capability can be used to implement a new kind of cluster filesystem that does not require user-space processes to transfer data, but to re-map disks instead.

The thesis will make use PCIe cards from Dolphin Interconnect Solutions. The Dolphin Express architecture uses PCI Express over cable to connect multiple computers in a switched network. Through the device lending paradigm, we are able to expose remote hardware on local machines underneath the operating system. By all intents and purposes, this hardware becomes local hardware.

Dolphin IX PCI Express Adapter

 

The Dolphin Express cards are based on a PCI Express non-transparent bridge, with application end-to-end latencies as low as 0.74 microseconds. Data transfers can be done either with Direct Memory Access or Programmed IO (PIO). There are several APIs for using these cards. One API is Dolphin Smart IO, which based on the Device Lending idea. Device Lending exploits the direct PCIe connection between computers and makes it possible to let hardware that is located physically in one computer appear to another computer as locally attached.

This opens up entirely new possibilities for remote operations.

One of these is a new concept for shared file systems. Rather than deploying a file server on the computer that physically attaches the disks, and similarly a network file system on all other computers, we can use the Device Lending idea to let the disks appear locally attached on several machines for a period of time.

The protocol overhead is then moved for overhead for every single command to initial overhead for virtually attaching the physically remote disk.

This thesis will look at two issues:

  • the complication of storing metadata for the file system on all computers that map a disk, and the synchronisation and cache coherency demands between these file systems
  • a file system that arranges disk blocks in a manner that is sensible for sharing across PCIe links in the proposed manner

Related courses:

  • The operating systems course INF3151 is a mandatory pre-requisite for this thesis.

To remove some complications, we will look exclusively at NVMe disks, which are SSDs that are directly connected to the PCIe bus.

  • Knowledge outcome: Optimization, performance analysis, virtualization systems
  • Knowledge required: Low-level programming (C/C++), operating systems
Publisert 16. sep. 2019 15:19 - Sist endret 16. sep. 2019 15:19