Inconsistent Mode Aggregation On All Unique Values
Introduction
InfluxDB is a popular open-source time-series database that provides a powerful query language for analyzing and visualizing data. However, like any complex system, it can have its quirks and inconsistencies. In this article, we will explore an issue with the Mode aggregation function in InfluxDB, specifically when aggregating over all unique values.
The Issue
When aggregating over all unique values, the Mode aggregation function in InfluxDB returns the smallest value instead of the earliest value in case of a tie. This behavior is inconsistent with the documented behavior, which states that the Mode aggregation function returns the earliest value in case of a tie.
Expected Behavior
The expected behavior is that the Mode aggregation function returns the earliest value in case of a tie, regardless of whether the values are all unique or not.
Unit Tests
To demonstrate the issue, we have written two unit tests using the Go programming language. The first test, TestCallIterator_Mode_Float_All_Equal_Occurrences_All_Unique
, tests the Mode aggregation function when aggregating over all unique values. The second test, TestCallIterator_Mode_Float_All_Equal_Occurrences_Not_Unique
, tests the Mode aggregation function when aggregating over values that are not all unique, but have ties.
func TestCallIterator_Mode_Float_All_Equal_Occurrences_All_Unique(t *testing.T) {
itr, _ := query.NewModeIterator(&FloatIterator{Points: []query.FloatPoint{
{Time: 0, Value: 2, Tags: ParseTags("region=us-east,host=hostA")},
{Time: 1, Value: 1, Tags: ParseTags("region=us-west,host=hostA")},
{Time: 2, Value: 4, Tags: ParseTags("region=us-east,host=hostA")},
{Time: 3, Value: 6, Tags: ParseTags("region=us-east,host=hostA")},
{Time: 4, Value: 0, Tags: ParseTags("region=us-east,host=hostA")},
{Time: 6, Value: 9, Tags: ParseTags("region=us-east,host=hostA")},
{Time: 7, Value: 3, Tags: ParseTags("region=us-east,host=hostA")},
{Time: 8, Value: 5, Tags: ParseTags("region=us-east,host=hostA")},
{Time: 9, Value: 7, Tags: ParseTags("region=us-east,host=hostA")},
{Time: 10, Value: 8, Tags: ParseTags("region=us-east,host=hostA")},
}},
query.IteratorOptions{
Expr: MustParseExpr(`mode("value")`),
Dimensions: []string{"host"},
Interval: query.Interval{Duration: 11 * time.Nanosecond},
Ordered: true,
Ascending: true,
},
)
if a, err := Iterators([]query.Iterator{itr}).ReadAll(); err != nil {
t.Fatalf("unexpected error: %s", err)
} else if diff := cmp.Diff(a, [][]query.Point{
// The smallest value is returned
{&query.FloatPoint{Time: 0, Value: 0, Tags: ParseTags("host=hostA"), Aggregated: 0}},
}); diff != "" {
t.Fatalf("unexpected points:\n%s", diff)
}
}
func TestCallIterator_Mode_Float_All_Equal_Occurrences_Not_Unique(t *testing.T) {
itr, _ := query.NewModeIterator(&FloatIterator{Points: []query.FloatPoint{
{Time: 0, Value: 2, Tags: ParseTags("region=us-east,host=hostA")},
{Time: 1, Value: 2, Tags: ParseTags("region=us-west,host=hostA")},
{Time: 2, Value: 4, Tags: ParseTags("region=us-east,host=hostA")},
{Time: 3, Value: 4, Tags: ParseTags("region=us-east,host=hostA")},
{Time: 4, Value: 0, Tags: ParseTags("region=us-east,host=hostA")},
{Time: 6, Value: 0, Tags: ParseTags("region=us-east,host=hostA")},
{Time: 7, Value: 3, Tags: ParseTags("region=us-east,host=hostA")},
{Time: 8, Value: 3, Tags: ParseTags("region=us-east,host=hostA")},
{Time: 9, Value: 7, Tags: ParseTags("region=us-east,host=hostA")},
{Time: 10, Value: 7, Tags: ParseTags("region=us-east,host=hostA")},
}},
query.IteratorOptions{
Expr: MustParseExpr(`mode("value")`),
Dimensions: []string{"host"},
Interval: query.Interval{Duration: 11 * time.Nanosecond},
Ordered: true,
Ascending: true,
},
)
if a, err := Iterators([]query.Iterator{itr}).ReadAll(); err != nil {
t.Fatalf("unexpected error: %s", err)
} else if diff := cmp.Diff(a, [][]query.Point{
// The earliest value is returned
{&query.FloatPoint{Time: 0, Value: 2, Tags: ParseTags("host=hostA"), Aggregated: 0}},
}); diff != "" {
t.Fatalf("unexpected points:\n%s", diff)
}
}
Conclusion
Q: What is the Mode aggregation function in InfluxDB?
A: The Mode aggregation function in InfluxDB is used to calculate the most frequently occurring value in a set of data.
Q: What is the expected behavior of the Mode aggregation function?
A: The expected behavior of the Mode aggregation function is to return the earliest value in case of a tie, regardless of whether the values are all unique or not.
Q: What is the actual behavior of the Mode aggregation function when aggregating over all unique values?
A: The actual behavior of the Mode aggregation function when aggregating over all unique values is to return the smallest value instead of the earliest value in case of a tie.
Q: Why is this behavior inconsistent with the documented behavior?
A: This behavior is inconsistent with the documented behavior because the Mode aggregation function is supposed to return the earliest value in case of a tie, not the smallest value.
Q: What are the implications of this inconsistency?
A: The implications of this inconsistency are that users may get unexpected results when using the Mode aggregation function, especially when aggregating over all unique values.
Q: How can I reproduce this issue?
A: You can reproduce this issue by running the two unit tests provided in the article, TestCallIterator_Mode_Float_All_Equal_Occurrences_All_Unique
and TestCallIterator_Mode_Float_All_Equal_Occurrences_Not_Unique
.
Q: What can I do to work around this issue?
A: To work around this issue, you can use the min
aggregation function instead of the mode
aggregation function when aggregating over all unique values.
Q: Is this issue specific to InfluxDB version 1.8.10?
A: No, this issue is not specific to InfluxDB version 1.8.10. It is a general issue with the Mode aggregation function in InfluxDB.
Q: Has this issue been reported to the InfluxDB community?
A: Yes, this issue has been reported to the InfluxDB community and is being tracked in the InfluxDB issue tracker.
Q: What is the status of the issue being fixed?
A: The status of the issue being fixed is currently being investigated and a fix is being developed.
Q: How can I stay up-to-date with the latest developments on this issue?
A: You can stay up-to-date with the latest developments on this issue by following the InfluxDB issue tracker and the InfluxDB community forums.