Potential Performance Improvements In `cargo-semver-checks`

Mar 13, 2025 by ADMIN 60 views

Potential Performance Improvements in cargo-semver-checks

As a developer, it's essential to continually evaluate and improve the performance of our tools to ensure they remain efficient and effective. In this article, we'll explore potential performance improvements in cargo-semver-checks (csc), a crucial tool for ensuring the semantic versioning of Rust crates.

A Typical Run

To understand the performance of csc, let's examine a typical run on the aws-sdk-datazone (datazone) example documentation file. This file has 119 external crates, 11,756 paths, and 80,074 elements in the index. We'll use cached Rustdocs as the baseline, as this is the most common scenario for both CI and local runs.

A typical hot cache run of csc on datazone yields the following results:

As expected, time is dominated by Rustc generating Rustdoc, with Cargo performing some initial work before falling off, and csc operating at the end. Zooming in on the results reveals more interesting insights:

The execution falls into five main phases:

Deserialise the current JSON: This phase involves deserializing the current JSON file.
Request crates.io for the baseline version of the JSON: This phase involves requesting the baseline version of the JSON from crates.io.
Deserialise the baseline JSON: This phase involves deserializing the baseline JSON.
Run lints: This phase involves running lints on the code.
Drop memory: This phase involves deallocating memory.

Ideas for Improvement

Parallelise the Web Request

One potential area for improvement is parallelizing the web request. The CPU chart shows that very little work is done CPU-wise, and nearly 100ms are spent waiting for a response. This suggests that the web request could be parallelized to be done while deserializing JSON.

Deserialising JSON

Another area for improvement is deserializing JSON. This process is currently done at least twice and may be done a dozen times in a single execution if csc supports cross-crate checking. To profile this, we created a new package with minimal code:

use json_bench::read_data;
use rustdoc_types::Crate;

fn main() {
    // read_data() is a helper function for outputting paths in the test_data directory
    for p in read_data() {
        let file_data = std::fs::read_to_string(p).unwrap();
        let _v: Crate = serde_json::from_str(&file_data).unwrap();
    }
}

Ignoring time spent on IO (4.6%) and dropping memory (30%), the truncated execution flamegraph shows that the timeline for execution matches up with the time we got when running csc.

Switching to `simd-json` (#885)

Unfortunately, simd-json is about 15% slower than serde-json when benchmarked on the code above. Replacing serde_json with simd-json may not be the best solution.

Removing Hashmap Resizing

A large (~15%) amount of time is spent resizing the hashmap used for the index. This is because of the size of Item - copying large amounts of memory takes a while. Replacing the HashMap with a BTreeMap doesn't increase performance. Box-ing Item does, but at the potential cost of future memory accesses being slower and possibly making it more difficult for rayon to shard the hashmap. Replacing the HashMap with an IndexMap gains the same deserialization performance as Box without changing the API too much.

Searching for Format Version Efficiently

Currently, to get the format version, we deserialize the JSON file looking for one specific line and then deserialize it again to a Crate object. Unfortunately, format_version is the very last element in the file. If the search is truncated to just what is necessary, this time can be eliminated.

Linter Execution Order

Some threads spend nearly half the execution time running while others finish rapidly. Investigation into whether this is the real effect of lints being different, in which case altering execution order may help, or due to poor work sharing from rayon is necessary to decide how to continue.

Dropping Memory

150ms are spent deallocating memory before program execution ends. One possible approach is to leak memory at the end of the program using std::mem::forget. This is not desirable for obvious reasons. Ideally, we would take the mold approach and fork the process.

Reproducing Results

Flamegraphs were obtained on a Windows machine running samply record in aws-sdk-rust/sdk/datazone/.

TL;DR The Path Forward

There's a lot of improvement potential in csc as it stands now. If everything goes well (it won't), there are easy 20-40% wins forking the process, replacing HashMap with an IndexMap, and truncating the format version search. Further time may be savable by deserializing current JSON while requesting baseline JSON and improving linter parallelization among many others.

Conclusion

In conclusion, there are several potential areas for improvement in cargo-semver-checks. By parallelizing the web request, deserializing JSON more efficiently, removing hashmap resizing, searching for format version efficiently, improving linter execution order, and dropping memory more effectively, we can significantly improve the performance of csc. By addressing these areas, we can make csc a more efficient and effective tool for ensuring the semantic versioning of Rust crates.
Q&A: Potential Performance Improvements in cargo-semver-checks

In our previous article, we explored potential performance improvements in cargo-semver-checks (csc). In this article, we'll answer some frequently asked questions about the potential improvements and provide more information on the topics discussed.

Q: What is the current performance of csc?

A: The current performance of csc is dominated by Rustc generating Rustdoc, with Cargo performing some initial work before falling off, and csc operating at the end. The execution falls into five main phases: deserializing the current JSON, requesting crates.io for the baseline version of the JSON, deserializing the baseline JSON, running lints, and dropping memory.

Q: What are the potential areas for improvement in csc?

A: The potential areas for improvement in csc include:

Parallelizing the web request
Deserializing JSON more efficiently
Removing hashmap resizing
Searching for format version efficiently
Improving linter execution order
Dropping memory more effectively

Q: How can I parallelize the web request?

A: Parallelizing the web request involves using a library like rayon to perform the request in parallel with other tasks. This can be achieved by using the rayon crate and its spawn function to create a new thread that performs the request.

Q: What is the best way to deserialize JSON efficiently?

A: The best way to deserialize JSON efficiently is to use a library like serde_json that is optimized for performance. Additionally, using a library like simd-json can provide a significant performance boost.

Q: How can I remove hashmap resizing?

A: Removing hashmap resizing involves using a library like IndexMap that is designed to avoid resizing. This can be achieved by replacing the HashMap with an IndexMap and using its insert method to add elements.

Q: How can I search for format version efficiently?

A: Searching for format version efficiently involves using a library like serde_json to parse the JSON file and then searching for the format version. This can be achieved by using the serde_json crate and its from_str function to parse the JSON file and then searching for the format version.

Q: How can I improve linter execution order?

A: Improving linter execution order involves using a library like rayon to parallelize the linter execution. This can be achieved by using the rayon crate and its spawn function to create a new thread that performs the linter execution.

Q: How can I drop memory more effectively?

A: Dropping memory more effectively involves using a library like std::mem to manually manage the memory. This can be achieved by using the std::mem crate and its forget function to manually manage the memory.

Q: What are the benefits of improving the performance of csc?

A: The benefits of improving the performance of csc include:

Faster execution times
Improved user experience
Increased productivity
Better support for large projects

Q: How can I get started with improving the performance of csc?

A: To get started with improving the performance of csc, you can:

Review the current performance of csc
Identify potential areas for improvement
Research and implement solutions for each area
Test and evaluate the performance of csc after each improvement

By following these steps, you can improve the performance of csc and make it a more efficient and effective tool for ensuring the semantic versioning of Rust crates.