Maintaining a Subversion repository can be a daunting task, mostly due to the complexities inherent in systems which have a database backend. Doing the task well is all about knowing the tools—what they are, when to use them, and how to use them. This section will introduce you to the repository administration tools provided by Subversion, and how to wield them to accomplish tasks such as repository migrations, upgrades, backups and cleanups.
Subversion provides a handful of utilities useful for creating, inspecting, modifying and repairing your repository. Let's look more closely at each of those tools. Afterward, we'll briefly examine some of the utilities included in the Berkeley DB distribution that provide functionality specific to your repository's database backend not otherwise provided by Subversion's own tools.
svnlook is a tool provided by Subversion for examining the various revisions and transactions in a repository. No part of this program attempts to change the repository—it's a “read-only” tool. svnlook is typically used by the repository hooks for reporting the changes that are about to be committed (in the case of the pre-commit hook) or that were just committed (in the case of the post-commit hook) to the repository. A repository administrator may use this tool for diagnostic purposes.
svnlook has a straightforward syntax:
$ svnlook help general usage: svnlook SUBCOMMAND REPOS_PATH [ARGS & OPTIONS ...] Note: any subcommand which takes the '--revision' and '--transaction' options will, if invoked without one of those options, act on the repository's youngest revision. Type "svnlook help <subcommand>" for help on a specific subcommand. …
Nearly every one of svnlook's
subcommands can operate on either a revision or a
transaction tree, printing information about the tree
itself, or how it differs from the previous revision of the
repository. You use the --revision
and
--transaction
options to specify which
revision or transaction, respectively, to examine. Note
that while revision numbers appear as natural numbers,
transaction names are alphanumeric strings. Keep in mind
that the filesystem only allows browsing of uncommitted
transactions (transactions that have not resulted in a new
revision). Most repositories will have no such
transactions, because transactions are usually either
committed (which disqualifies them from viewing) or aborted
and removed.
In the absence of both the --revision
and --transaction
options,
svnlook will examine the youngest (or
“HEAD”) revision in the repository. So the
following two commands do exactly the same thing when 19 is
the youngest revision in the repository located at
/path/to/repos
:
$ svnlook info /path/to/repos $ svnlook info /path/to/repos --revision 19
The only exception to these rules about subcommands is
the svnlook youngest subcommand, which
takes no options, and simply prints out the
HEAD
revision number.
$ svnlook youngest /path/to/repos 19
Output from svnlook is designed to be
both human- and machine-parsable. Take as an example the output
of the info
subcommand:
$ svnlook info /path/to/repos sally 2002-11-04 09:29:13 -0600 (Mon, 04 Nov 2002) 27 Added the usual Greek tree.
The output of the info
subcommand is
defined as:
The author, followed by a newline.
The date, followed by a newline.
The number of characters in the log message, followed by a newline.
The log message itself, followed by a newline.
This output is human-readable, meaning items like the datestamp are displayed using a textual representation instead of something more obscure (such as the number of nanoseconds since the Tasty Freeze guy drove by). But this output is also machine-parsable—because the log message can contain multiple lines and be unbounded in length, svnlook provides the length of that message before the message itself. This allows scripts and other wrappers around this command to make intelligent decisions about the log message, such as how much memory to allocate for the message, or at least how many bytes to skip in the event that this output is not the last bit of data in the stream.
Another common use of svnlook is to
actually view the contents of a revision or transaction
tree. The svnlook tree command displays
the directories and files in the requested tree. If you
supply the --show-ids
option, it will also
show the filesystem node revision IDs for each of those
paths (which is generally of more use to developers than to
users).
$ svnlook tree /path/to/repos --show-ids / <0.0.1> A/ <2.0.1> B/ <4.0.1> lambda <5.0.1> E/ <6.0.1> alpha <7.0.1> beta <8.0.1> F/ <9.0.1> mu <3.0.1> C/ <a.0.1> D/ <b.0.1> gamma <c.0.1> G/ <d.0.1> pi <e.0.1> rho <f.0.1> tau <g.0.1> H/ <h.0.1> chi <i.0.1> omega <k.0.1> psi <j.0.1> iota <1.0.1>
Once you've seen the layout of directories and files in your tree, you can use commands like svnlook cat, svnlook propget, and svnlook proplist to dig into the details of those files and directories.
svnlook can perform a variety of other queries, displaying subsets of bits of information we've mentioned previously, reporting which paths were modified in a given revision or transaction, showing textual and property differences made to files and directories, and so on. The following is a brief description of the current list of subcommands accepted by svnlook, and the output of those subcommands:
author
Print the tree's author.
cat
Print the contents of a file in the tree.
changed
List all files and directories that changed in the tree.
date
Print the tree's datestamp.
diff
Print unified diffs of changed files.
dirs-changed
List the directories in the tree that were themselves changed, or whose file children were changed.
history
Display interesting points in the history of a versioned path (places where modifications or copies occurred).
info
Print the tree's author, datestamp, log message character count, and log message.
lock
If a path is locked, describe the lock attributes.
log
Print the tree's log message.
propget
Print the value of a property on a path in the tree.
proplist
Print the names and values of properties set on paths in the tree.
tree
Print the tree listing, optionally revealing the filesystem node revision IDs associated with each path.
uuid
Print the repository's UUID— Universal Unique IDentifier.
youngest
Print the youngest revision number.
The svnadmin program is the repository administrator's best friend. Besides providing the ability to create Subversion repositories, this program allows you to perform several maintenance operations on those repositories. The syntax of svnadmin is similar to that of svnlook:
$ svnadmin help general usage: svnadmin SUBCOMMAND REPOS_PATH [ARGS & OPTIONS ...] Type "svnadmin help <subcommand>" for help on a specific subcommand. Available subcommands: create deltify dump help (?, h) …
We've already mentioned svnadmin's
create
subcommand (see the section called “Repository Creation and Configuration”). Most of the others we will
cover in more detail later in this chapter. For now, let's
just take a quick glance at what each of the available
subcommands offers.
create
Create a new Subversion repository.
deltify
Run over a specified revision range, performing
predecessor deltification on the paths changed in
those revisions. If no revisions are specified, this
command will simply deltify the
HEAD
revision.
dump
Dump the contents of the repository, bounded by a given set of revisions, using a portable dump format.
hotcopy
Make a hot copy of a repository. You can run this command at any time and make a safe copy of the repository, regardless if other processes are using the repository.
list-dblogs
(Berkeley DB repositories only.) List the paths of Berkeley DB log files associated with the repository. This list includes all log files—those still in use by Subversion, as well as those no longer in use.
list-unused-dblogs
(Berkeley DB repositories only.) List the paths of Berkeley DB log files associated with, but no longer used by, the repository. You may safely remove these log files from the repository layout, possibly archiving them for use in the event that you ever need to perform a catastrophic recovery of the repository.
load
Load a set of revisions into a repository from a
stream of data that uses the same portable dump format
generated by the dump
subcommand.
lslocks
List and describe any locks that exist in the repository.
lstxns
List the names of uncommitted Subversion transactions that currently exist in the repository.
recover
Perform recovery steps on a repository that is in need of such, generally after a fatal error has occurred that prevented a process from cleanly shutting down its communication with the repository.
rmlocks
Unconditionally remove locks from listed paths.
rmtxns
Cleanly remove Subversion transactions from the
repository (conveniently fed by output from the
lstxns
subcommand).
setlog
Replace the current value of the
svn:log
(commit log message)
property on a given revision in the repository with a
new value.
verify
Verify the contents of the repository. This includes, among other things, checksum comparisons of the versioned data stored in the repository.
Since Subversion stores everything in an opaque database system, attempting manual tweaks is unwise, if not quite difficult. And once data has been stored in your repository, Subversion generally doesn't provide an easy way to remove that data. [15] But inevitably, there will be times when you would like to manipulate the history of your repository. You might need to strip out all instances of a file that was accidentally added to the repository (and shouldn't be there for whatever reason). Or, perhaps you have multiple projects sharing a single repository, and you decide to split them up into their own repositories. To accomplish tasks like this, administrators need a more manageable and malleable representation of the data in their repositories—the Subversion repository dump format.
The Subversion repository dump format is a human-readable representation of the changes that you've made to your versioned data over time. You use the svnadmin dump command to generate the dump data, and svnadmin load to populate a new repository with it (see the section called “Migrating a Repository”). The great thing about the human-readability aspect of the dump format is that, if you aren't careless about it, you can manually inspect and modify it. Of course, the downside is that if you have two years' worth of repository activity encapsulated in what is likely to be a very large dump file, it could take you a long, long time to manually inspect and modify it.
While it won't be the most commonly used tool at the administrator's disposal, svndumpfilter provides a very particular brand of useful functionality—the ability to quickly and easily modify that dump data by acting as a path-based filter. Simply give it either a list of paths you wish to keep, or a list of paths you wish to not keep, then pipe your repository dump data through this filter. The result will be a modified stream of dump data that contains only the versioned paths you (explicitly or implicitly) requested.
The syntax of svndumpfilter is as follows:
$ svndumpfilter help general usage: svndumpfilter SUBCOMMAND [ARGS & OPTIONS ...] Type "svndumpfilter help <subcommand>" for help on a specific subcommand. Available subcommands: exclude include help (?, h)
There are only two interesting subcommands. They allow you to make the choice between explicit or implicit inclusion of paths in the stream:
exclude
Filter out a set of paths from the dump data stream.
include
Allow only the requested set of paths to pass through the dump data stream.
Let's look a realistic example of how you might use this program. We discuss elsewhere (see the section called “Choosing a Repository Layout”) the process of deciding how to choose a layout for the data in your repositories—using one repository per project or combining them, arranging stuff within your repository, and so on. But sometimes after new revisions start flying in, you rethink your layout and would like to make some changes. A common change is the decision to move multiple projects which are sharing a single repository into separate repositories for each project.
Our imaginary repository contains three projects:
calc
, calendar
, and
spreadsheet
. They have been living
side-by-side in a layout like this:
/ calc/ trunk/ branches/ tags/ calendar/ trunk/ branches/ tags/ spreadsheet/ trunk/ branches/ tags/
To get these three projects into their own repositories, we first dump the whole repository:
$ svnadmin dump /path/to/repos > repos-dumpfile * Dumped revision 0. * Dumped revision 1. * Dumped revision 2. * Dumped revision 3. … $
Next, run that dump file through the filter, each time including only one of our top-level directories, and resulting in three new dump files:
$ cat repos-dumpfile | svndumpfilter include calc > calc-dumpfile … $ cat repos-dumpfile | svndumpfilter include calendar > cal-dumpfile … $ cat repos-dumpfile | svndumpfilter include spreadsheet > ss-dumpfile … $
At this point, you have to make a decision. Each of
your dump files will create a valid repository,
but will preserve the paths exactly as they were in the
original repository. This means that even though you would
have a repository solely for your calc
project, that repository would still have a top-level
directory named calc
. If you want
your trunk
, tags
,
and branches
directories to live in the
root of your repository, you might wish to edit your
dump files, tweaking the Node-path
and
Node-copyfrom-path
headers to no longer have
that first calc/
path component. Also,
you'll want to remove the section of dump data that creates
the calc
directory. It will look
something like:
Node-path: calc Node-action: add Node-kind: dir Content-length: 0
If you do plan on manually editing the dump file to remove a top-level directory, make sure that your editor is not set to automatically convert end-lines to the native format (e.g. \r\n to \n) as the content will then not agree with the metadata and this will render the dump file useless.
All that remains now is to create your three new repositories, and load each dump file into the right repository:
$ svnadmin create calc; svnadmin load calc < calc-dumpfile <<< Started new transaction, based on original revision 1 * adding path : Makefile ... done. * adding path : button.c ... done. … $ svnadmin create calendar; svnadmin load calendar < cal-dumpfile <<< Started new transaction, based on original revision 1 * adding path : Makefile ... done. * adding path : cal.c ... done. … $ svnadmin create spreadsheet; svnadmin load spreadsheet < ss-dumpfile <<< Started new transaction, based on original revision 1 * adding path : Makefile ... done. * adding path : ss.c ... done. … $
Both of svndumpfilter's subcommands accept options for deciding how to deal with “empty” revisions. If a given revision contained only changes to paths that were filtered out, that now-empty revision could be considered uninteresting or even unwanted. So to give the user control over what to do with those revisions, svndumpfilter provides the following command-line options:
--drop-empty-revs
Do not generate empty revisions at all—just omit them.
--renumber-revs
If empty revisions are dropped (using the
--drop-empty-revs
option), change the
revision numbers of the remaining revisions so that
there are no gaps in the numeric sequence.
--preserve-revprops
If empty revisions are not dropped, preserve the revision properties (log message, author, date, custom properties, etc.) for those empty revisions. Otherwise, empty revisions will only contain the original datestamp, and a generated log message that indicates that this revision was emptied by svndumpfilter.
While svndumpfilter can be very
useful, and a huge timesaver, there are unfortunately a
couple of gotchas. First, this utility is overly sensitive
to path semantics. Pay attention to whether paths in your
dump file are specified with or without leading slashes.
You'll want to look at the Node-path
and
Node-copyfrom-path
headers.
… Node-path: spreadsheet/Makefile …
If the paths have leading slashes, you should include leading slashes in the paths you pass to svndumpfilter include and svndumpfilter exclude (and if they don't, you shouldn't). Further, if your dump file has an inconsistent usage of leading slashes for some reason, [16] you should probably normalize those paths so they all have, or lack, leading slashes.
Also, copied paths can give you some trouble. Subversion supports copy operations in the repository, where a new path is created by copying some already existing path. It is possible that at some point in the lifetime of your repository, you might have copied a file or directory from some location that svndumpfilter is excluding, to a location that it is including. In order to make the dump data self-sufficient, svndumpfilter needs to still show the addition of the new path—including the contents of any files created by the copy—and not represent that addition as a copy from a source that won't exist in your filtered dump data stream. But because the Subversion repository dump format only shows what was changed in each revision, the contents of the copy source might not be readily available. If you suspect that you have any copies of this sort in your repository, you might want to rethink your set of included/excluded paths.
If you're using a Berkeley DB repository, then all of
your versioned filesystem's structure and data live in a set
of database tables within the db
subdirectory of your repository. This subdirectory is a
regular Berkeley DB environment directory, and can therefore
be used in conjunction with any of the Berkeley database
tools (you can see the documentation for these tools at
Sleepycat's website,
http://www.sleepycat.com/).
For day-to-day Subversion use, these tools are unnecessary. Most of the functionality typically needed for Subversion repositories has been duplicated in the svnadmin tool. For example, svnadmin list-unused-dblogs and svnadmin list-dblogs perform a subset of what is provided by the Berkeley db_archive command, and svnadmin recover reflects the common use cases of the db_recover utility.
There are still a few Berkeley DB utilities that you might find useful. The db_dump and db_load programs write and read, respectively, a custom file format which describes the keys and values in a Berkeley DB database. Since Berkeley databases are not portable across machine architectures, this format is a useful way to transfer those databases from machine to machine, irrespective of architecture or operating system. Also, the db_stat utility can provide useful information about the status of your Berkeley DB environment, including detailed statistics about the locking and storage subsystems.
Your Subversion repository will generally require very little attention once it is configured to your liking. However, there are times when some manual assistance from an administrator might be in order. The svnadmin utility provides some helpful functionality to assist you in performing such tasks as:
modifying commit log messages,
removing dead transactions,
recovering “wedged” repositories, and
migrating repository contents to a different repository.
Perhaps the most commonly used of
svnadmin's subcommands is
setlog
. When a transaction is committed to
the repository and promoted to a revision, the descriptive log
message associated with that new revision (and provided by the
user) is stored as an unversioned property attached to the
revision itself. In other words, the repository remembers
only the latest value of the property, and discards previous
ones.
Sometimes a user will have an error in her log message (a
misspelling or some misinformation, perhaps). If the
repository is configured (using the
pre-revprop-change
and
post-revprop-change
hooks; see the section called “Hook Scripts”) to accept changes to this log
message after the commit is finished, then the user can
“fix” her log message remotely using the
svn program's propset
command (see Chapter 9, Subversion Complete Reference). However, because of
the potential to lose information forever, Subversion
repositories are not, by default, configured to allow changes
to unversioned properties—except by an administrator.
If a log message needs to be changed by an administrator,
this can be done using svnadmin setlog.
This command changes the log message (the
svn:log
property) on a given revision of a
repository, reading the new value from a provided file.
$ echo "Here is the new, correct log message" > newlog.txt $ svnadmin setlog myrepos newlog.txt -r 388
The svnadmin setlog command alone is
still bound by the same protections against modifying
unversioned properties as a remote client is—the
pre-
and
post-revprop-change
hooks are still
triggered, and therefore must be setup to accept changes of
this nature. But an administrator can get around these
protections by passing the --bypass-hooks
option to the svnadmin setlog command.
Remember, though, that by bypassing the hooks, you are likely avoiding such things as email notifications of property changes, backup systems which track unversioned property changes, and so on. In other words, be very careful about what you are changing, and how you change it.
Another common use of svnadmin is to query the repository for outstanding—possibly dead—Subversion transactions. In the event that a commit should fail, the transaction is usually cleaned up. That is, the transaction itself is removed from the repository, and any data associated with (and only with) that transaction is removed as well. Occasionally, though, a failure occurs in such a way that the cleanup of the transaction never happens. This could happen for several reasons: perhaps the client operation was inelegantly terminated by the user, or a network failure might have occurred in the middle of an operation, etc. Regardless of the reason, dead transactions can happen. They don't do any real harm, other than consuming a small bit of disk space. A fastidious administrator may nonetheless want to remove them.
You can use svnadmin's
lstxns
command to list the names of the
currently outstanding transactions.
$ svnadmin lstxns myrepos 19 3a1 a45 $
Each item in the resultant output can then be used with
svnlook (and its
--transaction
option) to determine who
created the transaction, when it was created, what types of
changes were made in the transaction—in other words,
whether or not the transaction is a safe candidate for
removal! If so, the transaction's name can be passed to
svnadmin rmtxns, which will perform the
cleanup of the transaction. In fact, the
rmtxns
subcommand can take its input
directly from the output of lstxns
!
$ svnadmin rmtxns myrepos `svnadmin lstxns myrepos` $
If you use these two subcommands like this, you should consider making your repository temporarily inaccessible to clients. That way, no one can begin a legitimate transaction before you start your cleanup. The following is a little bit of shell-scripting that can quickly generate information about each outstanding transaction in your repository:
Example 5.1. txn-info.sh (Reporting Outstanding Transactions)
#!/bin/sh ### Generate informational output for all outstanding transactions in ### a Subversion repository. REPOS="${1}" if [ "x$REPOS" = x ] ; then echo "usage: $0 REPOS_PATH" exit fi for TXN in `svnadmin lstxns ${REPOS}`; do echo "---[ Transaction ${TXN} ]-------------------------------------------" svnlook info "${REPOS}" --transaction "${TXN}" done
You can run the previous script using /path/to/txn-info.sh /path/to/repos. The output is basically a concatenation of several chunks of svnlook info output (see the section called “svnlook”), and will look something like:
$ txn-info.sh myrepos ---[ Transaction 19 ]------------------------------------------- sally 2001-09-04 11:57:19 -0500 (Tue, 04 Sep 2001) 0 ---[ Transaction 3a1 ]------------------------------------------- harry 2001-09-10 16:50:30 -0500 (Mon, 10 Sep 2001) 39 Trying to commit over a faulty network. ---[ Transaction a45 ]------------------------------------------- sally 2001-09-12 11:09:28 -0500 (Wed, 12 Sep 2001) 0 $
A long-abandoned transaction usually represents some sort of failed or interrupted commit. A transaction's datestamp can provide interesting information—for example, how likely is it that an operation begun nine months ago is still active?
In short, transaction cleanup decisions need not be made unwisely. Various sources of information—including Apache's error and access logs, the logs of successful Subversion commits, and so on—can be employed in the decision-making process. Finally, an administrator can often simply communicate with a seemingly dead transaction's owner (via email, for example) to verify that the transaction is, in fact, in a zombie state.
While the cost of storage has dropped incredibly in the past few years, disk usage is still a valid concern for administrators seeking to version large amounts of data. Every additional byte consumed by the live repository is a byte that needs to be backed up offsite, perhaps multiple times as part of rotating backup schedules. If using a Berkeley DB repository, the primary storage mechanism is a complex database system, it is useful to know what pieces of data need to remain on the live site, which need to be backed up, and which can be safely removed. This section is specific to Berkeley DB; FSFS repositories have no extra data to be cleaned up or reclaimed.
Until recently, the largest offender of disk space usage with respect to Subversion repositories was the log files to which Berkeley DB performs its pre-writes before modifying the actual database files. These files capture all the actions taken along the route of changing the database from one state to another—while the database files reflect at any given time some state, the log files contain all the many changes along the way between states. As such, they can start to accumulate quite rapidly.
Fortunately, beginning with the 4.2 release of Berkeley
DB, the database environment has the ability to remove its
own unused log files without any external procedures. Any
repositories created using an svnadmin
which is compiled against Berkeley DB version 4.2 or greater
will be configured for this automatic log file removal. If
you don't want this feature enabled, simply pass the
--bdb-log-keep
option to the
svnadmin create command. If you forget
to do this, or change your mind at a later time, simple edit
the DB_CONFIG
file found in your
repository's db
directory, comment out
the line which contains the set_flags
DB_LOG_AUTOREMOVE
directive, and then run
svnadmin recover on your repository to
force the configuration changes to take effect. See the section called “Berkeley DB Configuration” for more information about
database configuration.
Without some sort of automatic log file removal in place, log files will accumulate as you use your repository. This is actually somewhat of a feature of the database system—you should be able to recreate your entire database using nothing but the log files, so these files can be useful for catastrophic database recovery. But typically, you'll want to archive the log files that are no longer in use by Berkeley DB, and then remove them from disk to conserve space. Use the svnadmin list-unused-dblogs command to list the unused log files:
$ svnadmin list-unused-dblogs /path/to/repos /path/to/repos/log.0000000031 /path/to/repos/log.0000000032 /path/to/repos/log.0000000033 $ svnadmin list-unused-dblogs /path/to/repos | xargs rm ## disk space reclaimed!
To keep the size of the repository as small as possible, Subversion uses deltification (or, “deltified storage”) within the repository itself. Deltification involves encoding the representation of a chunk of data as a collection of differences against some other chunk of data. If the two pieces of data are very similar, this deltification results in storage savings for the deltified chunk—rather than taking up space equal to the size of the original data, it only takes up enough space to say, “I look just like this other piece of data over here, except for the following couple of changes”. Specifically, each time a new version of a file is committed to the repository, Subversion encodes the previous version (actually, several previous versions) as a delta against the new version. The result is that most of the repository data that tends to be sizable—namely, the contents of versioned files—is stored at a much smaller size than the original “fulltext” representation of that data.
Because all of the Subversion repository data that is subject to deltification is stored in a single Berkeley DB database file, reducing the size of the stored values will not necessarily reduce the size of the database file itself. Berkeley DB will, however, keep internal records of unused areas of the database file, and use those areas first before growing the size of the database file. So while deltification doesn't produce immediate space savings, it can drastically slow future growth of the database.
As mentioned in the section called “Berkeley DB”, a Berkeley DB repository can sometimes be left in frozen state if not closed properly. When this happens, an administrator needs to rewind the database back into a consistent state.
In order to protect the data in your repository, Berkeley DB uses a locking mechanism. This mechanism ensures that portions of the database are not simultaneously modified by multiple database accessors, and that each process sees the data in the correct state when that data is being read from the database. When a process needs to change something in the database, it first checks for the existence of a lock on the target data. If the data is not locked, the process locks the data, makes the change it wants to make, and then unlocks the data. Other processes are forced to wait until that lock is removed before they are permitted to continue accessing that section of the database. (This has nothing to do with the locks that you, as a user, can apply to versioned files within the repository; see Three meanings of “lock” for more information.)
In the course of using your Subversion repository, fatal errors (such as running out of disk space or available memory) or interruptions can prevent a process from having the chance to remove the locks it has placed in the database. The result is that the back-end database system gets “wedged”. When this happens, any attempts to access the repository hang indefinitely (since each new accessor is waiting for a lock to go away—which isn't going to happen).
First, if this happens to your repository, don't panic. The Berkeley DB filesystem takes advantage of database transactions and checkpoints and pre-write journaling to ensure that only the most catastrophic of events [17] can permanently destroy a database environment. A sufficiently paranoid repository administrator will be making off-site backups of the repository data in some fashion, but don't call your system administrator to restore a backup tape just yet.
Secondly, use the following recipe to attempt to “unwedge” your repository:
Make sure that there are no processes accessing (or attempting to access) the repository. For networked repositories, this means shutting down the Apache HTTP Server, too.
Become the user who owns and manages the repository. This is important, as recovering a repository while running as the wrong user can tweak the permissions of the repository's files in such a way that your repository will still be inaccessible even after it is “unwedged”.
Run the command svnadmin recover /path/to/repos. You should see output like this:
Repository lock acquired. Please wait; recovering the repository may take some time... Recovery completed. The latest repos revision is 19.
This command may take many minutes to complete.
Restart the Subversion server.
This procedure fixes almost every case of repository
lock-up. Make sure that you run this command as the user that
owns and manages the database, not just as
root
. Part of the recovery process might
involve recreating from scratch various database files (shared
memory regions, for example). Recovering as
root
will create those files such that they
are owned by root
, which means that even
after you restore connectivity to your repository, regular
users will be unable to access it.
If the previous procedure, for some reason, does not
successfully unwedge your repository, you should do two
things. First, move your broken repository out of the way and
restore your latest backup of it. Then, send an email to the
Subversion user list (at
<users@subversion.tigris.org>
) describing your
problem in detail. Data integrity is an extremely high
priority to the Subversion developers.
A Subversion filesystem has its data spread throughout
various database tables in a fashion generally understood by
(and of interest to) only the Subversion developers
themselves. However, circumstances may arise that call for
all, or some subset, of that data to be collected into a
single, portable, flat file format. Subversion provides such
a mechanism, implemented in a pair of
svnadmin subcommands:
dump
and load
.
The most common reason to dump and load a Subversion repository is due to changes in Subversion itself. As Subversion matures, there are times when certain changes made to the back-end database schema cause Subversion to be incompatible with previous versions of the repository. Other reasons for dumping and loading might be to migrate a Berkeley DB repository to a new OS or CPU architecture, or to switch between Berkeley DB and FSFS back-ends. The recommended course of action is relatively simple:
Using your current version of svnadmin, dump your repositories to dump files.
Upgrade to the new version of Subversion.
Move your old repositories out of the way, and create new empty ones in their place using your new svnadmin.
Again using your new svnadmin, load your dump files into their respective, just-created repositories.
Be sure to copy any customizations from your old
repositories to the new ones, including
DB_CONFIG
files and hook scripts.
You'll want to pay attention to the release notes for the
new release of Subversion to see if any changes since your
last upgrade affect those hooks or configuration
options.
If the migration process made your repository accessible at a different URL (e.g. moved to a different computer, or is being accessed via a different schema), then you'll probably want to tell your users to run svn switch --relocate on their existing working copies. See svn switch.
svnadmin dump will output a range of repository revisions that are formatted using Subversion's custom filesystem dump format. The dump format is printed to the standard output stream, while informative messages are printed to the standard error stream. This allows you to redirect the output stream to a file while watching the status output in your terminal window. For example:
$ svnlook youngest myrepos 26 $ svnadmin dump myrepos > dumpfile * Dumped revision 0. * Dumped revision 1. * Dumped revision 2. … * Dumped revision 25. * Dumped revision 26.
At the end of the process, you will have a single file
(dumpfile
in the previous example) that
contains all the data stored in your repository in the
requested range of revisions. Note that svnadmin
dump is reading revision trees from the repository
just like any other “reader” process would
(svn checkout, for example). So it's safe
to run this command at any time.
The other subcommand in the pair, svnadmin load, parses the standard input stream as a Subversion repository dump file, and effectively replays those dumped revisions into the target repository for that operation. It also gives informative feedback, this time using the standard output stream:
$ svnadmin load newrepos < dumpfile <<< Started new txn, based on original revision 1 * adding path : A ... done. * adding path : A/B ... done. … ------- Committed new rev 1 (loaded from original rev 1) >>> <<< Started new txn, based on original revision 2 * editing path : A/mu ... done. * editing path : A/D/G/rho ... done. ------- Committed new rev 2 (loaded from original rev 2) >>> … <<< Started new txn, based on original revision 25 * editing path : A/D/gamma ... done. ------- Committed new rev 25 (loaded from original rev 25) >>> <<< Started new txn, based on original revision 26 * adding path : A/Z/zeta ... done. * editing path : A/mu ... done. ------- Committed new rev 26 (loaded from original rev 26) >>>
The result of a load is new revisions added to a
repository—the same thing you get by making commits
against that repository from a regular Subversion client. And
just as in a commit, you can use hook scripts to perform
actions before and after each of the commits made during a load
process. By passing the --use-pre-commit-hook
and --use-post-commit-hook
options to
svnadmin load, you can instruct Subversion
to execute the pre-commit and post-commit hook scripts,
respectively, for each loaded revision. You might use these,
for example, to ensure that loaded revisions pass through the
same validation steps that regular commits pass through. Of
course, you should use these options with care—if your
post-commit hook sends emails to a mailing list for each new
commit, you might not want to spew hundreds or thousands of
commit emails in rapid succession at that list for each of the
loaded revisions! You can read more about the use of hook
scripts in the section called “Hook Scripts”.
Note that because svnadmin uses standard input and output streams for the repository dump and load process, people who are feeling especially saucy can try things like this (perhaps even using different versions of svnadmin on each side of the pipe):
$ svnadmin create newrepos $ svnadmin dump myrepos | svnadmin load newrepos
By default, the dump file will be quite large—much
larger than the repository itself. That's because every
version of every file is expressed as a full text in the
dump file. This is the fastest and simplest behavior, and nice
if you're piping the dump data directly into some other
process (such as a compression program, filtering program, or
into a loading process). But if you're creating a dump file for
longer-term storage, you'll likely want to save disk space by
using the --deltas
switch. With this option,
successive revisions of files will be output as compressed,
binary differences—just as file revisions are stored in
a repository. This option is slower, but results in a
dump file much closer in size to the original
repository.
We mentioned previously that svnadmin
dump outputs a range of revisions. Use the
--revision
option to specify a single
revision to dump, or a range of revisions. If you omit this
option, all the existing repository revisions will be
dumped.
$ svnadmin dump myrepos --revision 23 > rev-23.dumpfile $ svnadmin dump myrepos --revision 100:200 > revs-100-200.dumpfile
As Subversion dumps each new revision, it outputs only enough information to allow a future loader to re-create that revision based on the previous one. In other words, for any given revision in the dump file, only the items that were changed in that revision will appear in the dump. The only exception to this rule is the first revision that is dumped with the current svnadmin dump command.
By default, Subversion will not express the first dumped revision as merely differences to be applied to the previous revision. For one thing, there is no previous revision in the dump file! And secondly, Subversion cannot know the state of the repository into which the dump data will be loaded (if it ever, in fact, occurs). To ensure that the output of each execution of svnadmin dump is self-sufficient, the first dumped revision is by default a full representation of every directory, file, and property in that revision of the repository.
However, you can change this default behavior. If you add
the --incremental
option when you dump your
repository, svnadmin will compare the first
dumped revision against the previous revision in the
repository, the same way it treats every other revision that
gets dumped. It will then output the first revision exactly
as it does the rest of the revisions in the dump
range—mentioning only the changes that occurred in that
revision. The benefit of this is that you can create several
small dump files that can be loaded in succession, instead of
one large one, like so:
$ svnadmin dump myrepos --revision 0:1000 > dumpfile1 $ svnadmin dump myrepos --revision 1001:2000 --incremental > dumpfile2 $ svnadmin dump myrepos --revision 2001:3000 --incremental > dumpfile3
These dump files could be loaded into a new repository with the following command sequence:
$ svnadmin load newrepos < dumpfile1 $ svnadmin load newrepos < dumpfile2 $ svnadmin load newrepos < dumpfile3
Another neat trick you can perform with this
--incremental
option involves appending to an
existing dump file a new range of dumped revisions. For
example, you might have a post-commit
hook
that simply appends the repository dump of the single revision
that triggered the hook. Or you might have a script that runs
nightly to append dump file data for all the revisions that
were added to the repository since the last time the script
ran. Used like this, svnadmin's
dump
and load
commands
can be a valuable means by which to backup changes to your
repository over time in case of a system crash or some other
catastrophic event.
The dump format can also be used to merge the contents of
several different repositories into a single repository. By
using the --parent-dir
option of svnadmin
load, you can specify a new virtual root directory
for the load process. That means if you have dump files for
three repositories, say calc-dumpfile
,
cal-dumpfile
, and
ss-dumpfile
, you can first create a new
repository to hold them all:
$ svnadmin create /path/to/projects $
Then, make new directories in the repository which will encapsulate the contents of each of the three previous repositories:
$ svn mkdir -m "Initial project roots" \ file:///path/to/projects/calc \ file:///path/to/projects/calendar \ file:///path/to/projects/spreadsheet Committed revision 1. $
Lastly, load the individual dump files into their respective locations in the new repository:
$ svnadmin load /path/to/projects --parent-dir calc < calc-dumpfile … $ svnadmin load /path/to/projects --parent-dir calendar < cal-dumpfile … $ svnadmin load /path/to/projects --parent-dir spreadsheet < ss-dumpfile … $
We'll mention one final way to use the Subversion repository dump format—conversion from a different storage mechanism or version control system altogether. Because the dump file format is, for the most part, human-readable, [18] it should be relatively easy to describe generic sets of changes—each of which should be treated as a new revision—using this file format. In fact, the cvs2svn utility (see the section called “Converting a Repository from CVS to Subversion”) uses the dump format to represent the contents of a CVS repository so that those contents can be copied into a Subversion repository.
Despite numerous advances in technology since the birth of the modern computer, one thing unfortunately rings true with crystalline clarity—sometimes, things go very, very awry. Power outages, network connectivity dropouts, corrupt RAM and crashed hard drives are but a taste of the evil that Fate is poised to unleash on even the most conscientious administrator. And so we arrive at a very important topic—how to make backup copies of your repository data.
There are generally two types of backup methods available for Subversion repository administrators—incremental and full. We discussed in an earlier section of this chapter how to use svnadmin dump --incremental to perform an incremental backup (see the section called “Migrating a Repository”). Essentially, the idea is to only backup at a given time the changes to the repository since the last time you made a backup.
A full backup of the repository is quite literally a duplication of the entire repository directory (which includes either Berkeley database or FSFS environment). Now, unless you temporarily disable all other access to your repository, simply doing a recursive directory copy runs the risk of generating a faulty backup, since someone might be currently writing to the database.
In the case of Berkeley DB, Sleepycat documents describe a
certain order in which database files can be copied that will
guarantee a valid backup copy. And a similar ordering exists
for FSFS data. Better still, you don't have to implement
these algorithms yourself, because the Subversion development
team has already done so. The
hot-backup.py script is found in the
tools/backup/
directory of the Subversion
source distribution. Given a repository path and a backup
location, hot-backup.py—which is
really just a more intelligent wrapper around the
svnadmin hotcopy command—will perform
the necessary steps for backing up your live
repository—without requiring that you bar public
repository access at all—and then will clean out the
dead Berkeley log files from your live repository.
Even if you also have an incremental backup, you might
want to run this program on a regular basis. For example, you
might consider adding hot-backup.py to a
program scheduler (such as cron on Unix
systems). Or, if you prefer fine-grained backup solutions,
you could have your post-commit hook script call
hot-backup.py (see the section called “Hook Scripts”), which will then cause a new
backup of your repository to occur with every new revision
created. Simply add the following to the
hooks/post-commit
script in your live
repository directory:
(cd /path/to/hook/scripts; ./hot-backup.py ${REPOS} /path/to/backups &)
The resulting backup is a fully functional Subversion repository, able to be dropped in as a replacement for your live repository should something go horribly wrong.
There are benefits to both types of backup methods. The easiest is by far the full backup, which will always result in a perfect working replica of your repository. This again means that should something bad happen to your live repository, you can restore from the backup with a simple recursive directory copy. Unfortunately, if you are maintaining multiple backups of your repository, these full copies will each eat up just as much disk space as your live repository.
Incremental backups using the repository dump format are excellent to have on hand if the database schema changes between successive versions of Subversion itself. Since a complete repository dump and load are generally required to upgrade your repository to the new schema, it's very convenient to already have half of that process (the dump part) finished. Unfortunately, the creation of—and restoration from—incremental backups takes longer, as each commit is effectively replayed into either the dump file or the repository.
In either backup scenario, repository administrators need to be aware of how modifications to unversioned revision properties affect their backups. Since these changes do not themselves generate new revisions, they will not trigger post-commit hooks, and may not even trigger the pre-revprop-change and post-revprop-change hooks. [19] And since you can change revision properties without respect to chronological order—you can change any revision's properties at any time—an incremental backup of the latest few revisions might not catch a property modification to a revision that was included as part of a previous backup.
Generally speaking, only the truly paranoid would need to backup their entire repository, say, every time a commit occurred. However, assuming that a given repository has some other redundancy mechanism in place with relatively fine granularity (like per-commit emails), a hot backup of the database might be something that a repository administrator would want to include as part of a system-wide nightly backup. For most repositories, archived commit emails alone provide sufficient redundancy as restoration sources, at least for the most recent few commits. But it's your data—protect it as much as you'd like.
Often, the best approach to repository backups is a diversified one. You can leverage combinations of full and incremental backups, plus archives of commit emails. The Subversion developers, for example, back up the Subversion source code repository after every new revision is created, and keep an archive of all the commit and property change notification emails. Your solution might be similar, but should be catered to your needs and that delicate balance of convenience with paranoia. And while all of this might not save your hardware from the iron fist of Fate, [20] it should certainly help you recover from those trying times.
[15] That, by the way, is a feature, not a bug.
[16] While svnadmin dump has a consistent leading slash policy—to not include them—other programs which generate dump data might not be so consistent.
[17] E.g.: hard drive + huge electromagnet = disaster.
[18] The Subversion repository dump format resembles an RFC-822 format, the same type of format used for most email.
[19] svnadmin setlog can be called in a way that bypasses the hook interface altogether.
[20] You know—the collective term for all of her “fickle fingers”.