Elasticsearch use case

This use case shows the exploratory capabilities of DIO to analyze the Elasticsearch application. Namely, we used DIO to understand how Elasticsearch accesses the storage system. Moreover, by following a top-down approach, we show how users can observe inefficient file access patterns.
The instructions to reproduce the use case are available at how to reproduce, while an extendend set of visualizations provided by DIO are available at portfolio.

System overview

Using DIO we observed that Elasticsearch generated about 1 million storage-related system calls, 99% of them targeting regular files and the remaining ones targeting directories. Moreover, Elasticsearch uses mainly data-related operations (88%), most of them being write (70.97%), pread64 (6.63%) and read (5.17%). Further, it spawns a total of 42 processes and 118 threads, while accessing almost 4000 files.

Figure 1. System overview of Elasticsearch execution

File access patterns

By observing the file access patterns, DIO shows that some files exhibit a constant access pattern, even in the absence of client requests. Namely, every 30 seconds Elasticsearch submits at least 2 system calls to the node.lock file (see Figure 2 - blue line). Moreover, every 2 minutes, 9 system calls to .es_temp_file file (see Figure 2 - pink line).

Figure 2. Sample of Elasticsearch file accesses

File access pattern for .es_temp_file

By analyzing in more detail the file access pattern for .es_temp_file, DIO shows that Elasticsearch performs always the same sequecence of system calls: lstat→openat→write→openat→fsync→close→close→lstat→unlink (see Figure 3).

Figure 3. File access pattern for .es_temp_file

Observation: this sequence of system calls shows an unexpected duplication of openat and close system calls.
Explanation: As show in Figure 4, Elasticsearch's Java source code used for accessing the .es_temp_file file resorts to high-level I/O libraries, which end up masking the underlying system calls and resulting in the duplication of system calls.

Figure 4. Mapping between system calls issued to .es_temp_file and corresponding code generating these.

  • The first openat is generated by the Files.newOutputStream method (1), which opens an output stream used for writing data to the file (2).
  • An IOUtils.fsync method is then invoked to flush dirty pages to disk (3).
  • However, rather than using the already opened file descriptor, created from the first openat call, it internally reopens and closes the file again.
  • Finally, upon the file's removal request (4), three system calls are issued: a close corresponding to the first openat, a lstat, and an unlink.