About DIO


What is DIO?


DIO is a generic tool for observing and diagnosing applications storage I/O. It is designed to be used by applications developers and users to understand how applications interact with storage systems. By combining system call tracing with a customizable data analysis and visualization pipeline, DIO provide non-intrusive and comprehensive I/O diagnosis for applications using in-kernel POSIX storage backends (e.g., ext4, linux block device).


How DIO works


DIO's architecture and flow of events.


  • Tracer

  • The tracer component relies on the eBPF technology to intercept system calls done by applications in a non-intrusive way. Briefly, it comprises a set of eBPF programs that, at the initialization phase (1), are attached to system calls tracepoints. These eBPF programs will collect the relevant information about the system calls (in kernel) and place it in a ring buffer (2) to be accessed in user space. At user space, the tracer is constantly pooling events from the ring buffer (3) and sending them to the backend (4) for storage.

  • Backend

  • The backend component persists and indexes events (5), and allows users to query and summarize (e.g., aggregate) stored information (6). It uses the Elasticsearch distributed engine for storing and processing large volumes of data. By providing an interface for searching, querying, and updating data, the backend allows users to develop and integrate customized data correlation algorithms.

  • Visualizer

  • The visualizer component provides near real-time visualization of the traced events by querying the backend (7). It uses Kibana, the data visualization dashboard software for Elasticsearch, which offers a web interface for data exploration and analysis. Moreover, it allows users to select specific types of data (e.g., system call type, arguments) to build different and customized representations.