Fluent Bit use case

This use case considers the diagnosis of a data loss issue identified in the Fluent Bit application.
The instructions to reproduce the use case are available at how to reproduce, while an extended set of visualizations provided by DIO is available at portfolio.

Problem

Existing issues (#1875,#4895) report that log data is lost when using the tail input plugin, which is used to fetch new content being added to log files.

Diagnosis

Using DIO to analyze the application execution, we obtained the following information (see Figure 1):

  • The client program (app) starts by creating the app.log file, writing 26 bytes starting from offset 0, and closing the file (1).
  • Then, Fluent Bit (fluent-bit) detects content modification at the file, opens it, and reads 26 bytes from offset 0, which means that fluent-bit processes the full content previously written by app (2).
  • Later, app removes the file with the unlink sytem call(3).
  • app then creates a new file with the same name as the previous one (app.log) and writes 16 bytes to it (4).
  • fluent-bit opens the new log file for reading its content, but instead of reading from offset 0, as expected, it starts reading at offset 26 (5).
  • By starting at the wrong offset, the read sytem call returns zero bytes and the 16 bytes written by app are lost.

Figure 1. Fluent Bit erroneous access pattern leading to data loss.

Explanation

When a file is removed, the operating system releases the associated inode number (12), which can later be attributed to a new file. However, a possible scenario is this inode number being mapped to a newly created file with the same name. This is what happens in this case, as the inode number of the file app.log is reused for the new file created by app.

Before reading a file, Fluent Bit updates the file position to the number of bytes already processed. This value is kept on a database for each tracked file, identified by its name plus inode number. Erroneously, database entries are not deleted when files are removed at the file system. Therefore, since the same file name (app.log) and inode number (12) are attributed to the newly created file, fluent-bit erroneously assumes that the first 26 bytes of the latter log file were already processed.