Potential Performance Improvements In `cargo-semver-checks`
Potential Performance Improvements in cargo-semver-checks
As a developer, it's essential to continually evaluate and improve the performance of our tools to ensure they remain efficient and effective. In this article, we'll explore potential performance improvements in cargo-semver-checks
(csc
), a crucial tool for ensuring the semantic versioning of Rust crates.
A Typical Run
To understand the performance of csc
, let's examine a typical run on the aws-sdk-datazone
(datazone
) example documentation file. This file has 119 external crates, 11,756 paths, and 80,074 elements in the index. We'll use cached Rustdocs as the baseline, as this is the most common scenario for both CI and local runs.
A typical hot cache run of csc
on datazone
yields the following results:
As expected, time is dominated by Rustc generating Rustdoc, with Cargo performing some initial work before falling off, and csc
operating at the end. Zooming in on the results reveals more interesting insights:
The execution falls into five main phases:
- Deserialise the current JSON: This phase involves deserializing the current JSON file.
- Request crates.io for the baseline version of the JSON: This phase involves requesting the baseline version of the JSON from crates.io.
- Deserialise the baseline JSON: This phase involves deserializing the baseline JSON.
- Run lints: This phase involves running lints on the code.
- Drop memory: This phase involves deallocating memory.
Ideas for Improvement
Parallelise the Web Request
One potential area for improvement is parallelizing the web request. The CPU chart shows that very little work is done CPU-wise, and nearly 100ms are spent waiting for a response. This suggests that the web request could be parallelized to be done while deserializing JSON.
Deserialising JSON
Another area for improvement is deserializing JSON. This process is currently done at least twice and may be done a dozen times in a single execution if csc
supports cross-crate checking. To profile this, we created a new package with minimal code:
use json_bench::read_data;
use rustdoc_types::Crate;
fn main() {
// read_data() is a helper function for outputting paths in the test_data directory
for p in read_data() {
let file_data = std::fs::read_to_string(p).unwrap();
let _v: Crate = serde_json::from_str(&file_data).unwrap();
}
}
Ignoring time spent on IO (4.6%) and dropping memory (30%), the truncated execution flamegraph shows that the timeline for execution matches up with the time we got when running csc
.
Switching to simd-json
(#885)
Unfortunately, simd-json
is about 15% slower than serde-json
when benchmarked on the code above. Replacing serde_json
with simd-json
may not be the best solution.
Removing Hashmap Resizing
A large (~15%) amount of time is spent resizing the hashmap used for the index. This is because of the size of Item
- copying large amounts of memory takes a while. Replacing the HashMap
with a BTreeMap
doesn't increase performance. Box
-ing Item
does, but at the potential cost of future memory accesses being slower and possibly making it more difficult for rayon
to shard the hashmap. Replacing the HashMap
with an IndexMap
gains the same deserialization performance as Box
without changing the API too much.
Searching for Format Version Efficiently
Currently, to get the format version, we deserialize the JSON file looking for one specific line and then deserialize it again to a Crate
object. Unfortunately, format_version
is the very last element in the file. If the search is truncated to just what is necessary, this time can be eliminated.
Linter Execution Order
Some threads spend nearly half the execution time running while others finish rapidly. Investigation into whether this is the real effect of lints being different, in which case altering execution order may help, or due to poor work sharing from rayon
is necessary to decide how to continue.
Dropping Memory
150ms are spent deallocating memory before program execution ends. One possible approach is to leak memory at the end of the program using std::mem::forget
. This is not desirable for obvious reasons. Ideally, we would take the mold approach and fork the process.
Reproducing Results
Flamegraphs were obtained on a Windows machine running samply record
in aws-sdk-rust/sdk/datazone/
.
TL;DR The Path Forward
There's a lot of improvement potential in csc
as it stands now. If everything goes well (it won't), there are easy 20-40% wins forking the process, replacing HashMap
with an IndexMap
, and truncating the format version search. Further time may be savable by deserializing current JSON while requesting baseline JSON and improving linter parallelization among many others.
Conclusion
In conclusion, there are several potential areas for improvement in cargo-semver-checks
. By parallelizing the web request, deserializing JSON more efficiently, removing hashmap resizing, searching for format version efficiently, improving linter execution order, and dropping memory more effectively, we can significantly improve the performance of csc
. By addressing these areas, we can make csc
a more efficient and effective tool for ensuring the semantic versioning of Rust crates.
Q&A: Potential Performance Improvements in cargo-semver-checks
In our previous article, we explored potential performance improvements in cargo-semver-checks
(csc
). In this article, we'll answer some frequently asked questions about the potential improvements and provide more information on the topics discussed.
Q: What is the current performance of csc
?
A: The current performance of csc
is dominated by Rustc generating Rustdoc, with Cargo performing some initial work before falling off, and csc
operating at the end. The execution falls into five main phases: deserializing the current JSON, requesting crates.io for the baseline version of the JSON, deserializing the baseline JSON, running lints, and dropping memory.
Q: What are the potential areas for improvement in csc
?
A: The potential areas for improvement in csc
include:
- Parallelizing the web request
- Deserializing JSON more efficiently
- Removing hashmap resizing
- Searching for format version efficiently
- Improving linter execution order
- Dropping memory more effectively
Q: How can I parallelize the web request?
A: Parallelizing the web request involves using a library like rayon
to perform the request in parallel with other tasks. This can be achieved by using the rayon
crate and its spawn
function to create a new thread that performs the request.
Q: What is the best way to deserialize JSON efficiently?
A: The best way to deserialize JSON efficiently is to use a library like serde_json
that is optimized for performance. Additionally, using a library like simd-json
can provide a significant performance boost.
Q: How can I remove hashmap resizing?
A: Removing hashmap resizing involves using a library like IndexMap
that is designed to avoid resizing. This can be achieved by replacing the HashMap
with an IndexMap
and using its insert
method to add elements.
Q: How can I search for format version efficiently?
A: Searching for format version efficiently involves using a library like serde_json
to parse the JSON file and then searching for the format version. This can be achieved by using the serde_json
crate and its from_str
function to parse the JSON file and then searching for the format version.
Q: How can I improve linter execution order?
A: Improving linter execution order involves using a library like rayon
to parallelize the linter execution. This can be achieved by using the rayon
crate and its spawn
function to create a new thread that performs the linter execution.
Q: How can I drop memory more effectively?
A: Dropping memory more effectively involves using a library like std::mem
to manually manage the memory. This can be achieved by using the std::mem
crate and its forget
function to manually manage the memory.
Q: What are the benefits of improving the performance of csc
?
A: The benefits of improving the performance of csc
include:
- Faster execution times
- Improved user experience
- Increased productivity
- Better support for large projects
Q: How can I get started with improving the performance of csc
?
A: To get started with improving the performance of csc
, you can:
- Review the current performance of
csc
- Identify potential areas for improvement
- Research and implement solutions for each area
- Test and evaluate the performance of
csc
after each improvement
By following these steps, you can improve the performance of csc
and make it a more efficient and effective tool for ensuring the semantic versioning of Rust crates.