You're not currently signed in.

This page notes issues regarding using svk on large trees.

Basically svk performs well on large trees as we had the scenario in mind since long ago, and kept stress testing svk.

Space on checkout copy

Subversion kept lots of redundant data in the .svn control directories. Besides the pristine copies of text-base, which is a natural requirement for the model it uses to at least make diff work offline, each directory's entries file has url and revision tag. This makes it possible for a subdirectories to be copied as a fully usable Subversion working copy, at the cost of every working-copy-crawling commands has to walk through all directories to model the tree before it invoke the tree-delta function. Accompanied by the xml format of the entries file, this is the weakest part of Subversion on large trees, as all operations are slow.

In svk, the state of checkout copies are maintained by a simple data structure with Data::Hierarchy, serialized in ~/.svk. A fresh checkout will only consume several bytes more than the actual files under version control. And the state file

grows when you have mixed-revision checkout copies, but it tries hard to optimize and keep

the state as small as possible.

A post-1.0 task is to optionally allow separating the state and store into .svk in top-level of checkouts, to isolate the effect of large states for individual checkout copy, which seems rather neglectable for now.

Space in depot

Thanks to the fancy skip-delta implemented in libsvn_fs, the storage in the repository is small while it's still efficient to build the fulltext of any revision in O(lgN) time.

A 130000-revision depot of selected branches of the FreeBSD repository takes about 2G of disk space in a fsfs repository.

One problem of libsvn_fs's cheap copy is that it's not as cheap once you start modifying files on branches. If you copy a 1G tree to a branch, it's fine, and the size of the repository doesn't grow. But if you change 1 byte for everything file (think updating copyright year), all node on the branch magically concocts a fulltext entry, which grows your repository by 1G. That says, it doesn't reuse deltas on line-of-change across copies. But once this starts to make problems to Subversion adopters, they will definitely fix that.

Checkout signatures

To avoid doing full-text comparison to check if a file is modified when doing checkout-crawling operations, svk uses a lightweight inode/size/mtime signature mechanism, optimized for Perl.

This will surely mask the changes you made quickly to a file by replacing one byte in it. The svk admin rmcache command comes to rescue in this case, which clears ~/.svk/cache for you.

Constructing fulltext

Over time, when a file gets numerous revisions, constructing the fulltext can take a bit longer for systems like tla, so they suggest users to split repository by year.

svk uses libsvn_fs, which constructs fulltext of any revision in O(lgN) time, where N is the number of revisions on the node in the same location.

Finding merge history

This is where large number of revisions can cause performance problem for svk. svk sticks merge tickets to record merge history, which can be obtained in O(1) time. However, svk also needs to trace back the line of history to discover where the node was copied from. This is an expensive operation due to the lack of appropriate Subversion API, see also Subversion issue #1970.

Since svk 0.29, some cache are stored as revision properties to make this copy history tracing reasonably fast enough for 30000-revision trees.