Here are the highlights of what’s new and improved in Skedler Reports 4.19.0 & Alerts 4.9.0. For detailed information about this release, check the release notes.
Indexing speed improvement
Elasticsearch 7.10 improves indexing speed by up to 20%. We’ve reduced the coordination needed to add entries to the transaction log. This reduction allows for more concurrency and increases the transaction log buffer size from 8KB to 1MB. However, performance gains are lower for full-text search and other analysis-intensive use cases. The heavier the indexing chain, the lower the gains, so indexing chains that involve many fields, ingest pipelines or full-text indexing will see lower gains which can now be utilized in Skedler v4.19.0.
More space-efficient indices
Elasticsearch 7.10 depends on Apache Lucene 8.7, which introduces higher compression of stored fields, the part of the index that notably stores the _source. On the various data sets that we benchmark against, we noticed space reductions between 0% and 10%. This change especially helps on data sets that have lots of redundant data across documents, which is typically the case of the documents that are produced by our Observability solutions, which repeat metadata about the host that produced the data on every document.
Elasticsearch offers the ability to configure the index.codec setting to tell Elasticsearch how aggressively to compress stored fields. Both supported values default and best_compression will get better compression with this change.
7.10 introduces the concept of formalized data tiers within Elasticsearch. Data tiers are a simple, integrated approach that gives users control over-optimizing for cost, performance, and breadth/depth of data. Prior to this formalization, many users configured their own tier topology using custom node attributes as well as using ILM to manage the lifecycle and location of data within a cluster.
With this formalization, data tiers (content, hot, warm, and cold) can be explicitly configured using node roles, and indices can be configured to be allocated within a specific tier using index-level data tier allocation filtering. ILM will make use of these tiers to automatically migrate data between nodes as an index goes through the phases of its lifecycle.
Newly created indices abstracted by a data stream will be allocated to the data_hot tier automatically, while standalone indices will be allocated to the data_content tier automatically. Nodes with the pre-existing data role are considered to be part of all tiers.
AUC ROC evaluation metrics for classification analysis
The area under the curve of the receiver operating characteristic (AUC ROC) is an evaluation metric that has been available for outlier detection since 7.3 and now is available for classification analysis. AUC ROC represents the performance of the classification process at different predicted probability thresholds. The true positive rate for a specific class is compared against the rate of all the other classes combined at the different threshold levels to create the curve.
Custom feature processors in data frame analytics
Feature processors enable you to extract process features from document fields. You can use these features in model training and model deployment. Custom feature processors provide a mechanism to create features that can be used at search and ingest time and they don’t take up space in the index. This process more tightly couples feature generation with the resulting model. The result is simplified model management as both the features and the model can easily follow the same life cycle.
Points in time (PITs) for search
In 7.10, Elasticsearch introduces points in time (PITs), a lightweight way to preserve index state over searches. PITs improve the end-user experience by making UIs more reactive supported by Skedler v4.19.0
By default, a search request waits for complete results before returning a response. For example, a search that retrieves top hits and aggregations returns a response only after both top hits and aggregations are computed. However, aggregations are usually slower and more expensive to compute than top hits. Instead of sending a combined request, you can send two separate requests: one for top hits and another one for aggregations. With separate search requests, a UI can display top hits as soon as they’re available and display aggregation data after the slower aggregation request completes. You can use a PIT to ensure both search requests run on the same data and index state.
New thread pools for system indices
We’ve added two new thread pools for system indices: system_read and system_write. These thread pools ensure system indices critical to the Elastic Stack, such as those used by security or Kibana, remain responsive when a cluster is under heavy query or indexing load.
system_read is a fixed thread pool used to manage resources for reading operations targeting system indices. Similarly, system_write is a fixed thread pool used to manage resources for write operations targeting system indices. Both have a maximum number of threads equal to 5 or half of the available processors, whichever is smaller.