Almost every developer who has used the C programming language has at some point sighed at the daunting task of managing memory usage. Allocating enough memory to use, keeping track of those allocations, freeing the memory when you no longer need it—these tasks can be quite complex. And of course, failure to do those things properly can result in a program that crashes itself, or worse, crashes the computer. Fortunately, the APR library that Subversion depends on for portability provides the apr_pool_t type, which represents a pool from which the application may allocate memory.
A memory pool is an abstract representation of a chunk of
memory allocated for use by a program. Rather than requesting
memory directly from the OS using the standard
malloc()
and friends, programs that link
against APR can simply request that a pool of memory be created
(using the apr_pool_create()
function).
APR will allocate a moderately sized chunk of memory from the
OS, and that memory will be instantly available for use by the
program. Any time the program needs some of the pool memory, it
uses one of the APR pool API functions, like
apr_palloc()
, which returns a generic
memory location from the pool. The program can keep requesting
bits and pieces of memory from the pool, and APR will keep
granting the requests. Pools will automatically grow in size to
accommodate programs that request more memory than the original
pool contained, until of course there is no more memory
available on the system.
Now, if this were the end of the pool story, it would hardly
have merited special attention. Fortunately, that's not the
case. Pools can not only be created; they can also be cleared
and destroyed, using apr_pool_clear()
and
apr_pool_destroy()
respectively. This
gives developers the flexibility to allocate several—or
several thousand—things from the pool, and then clean up
all of that memory with a single function call! Further, pools
have hierarchy. You can make “subpools” of any
previously created pool. When you clear a pool, all of its
subpools are destroyed; if you destroy a pool, it and its
subpools are destroyed.
Before we go further, developers should be aware that they
probably will not find many calls to the APR pool functions we
just mentioned in the Subversion source code. APR pools offer
some extensibility mechanisms, like the ability to have custom
“user data” attached to the pool, and mechanisms
for registering cleanup functions that get called when the pool
is destroyed. Subversion makes use of these extensions in a
somewhat non-trivial way. So, Subversion supplies (and most of
its code uses) the wrapper functions
svn_pool_create()
,
svn_pool_clear()
, and
svn_pool_destroy()
.
While pools are helpful for basic memory management, the pool construct really shines in looping and recursive scenarios. Since loops are often unbounded in their iterations, and recursions in their depth, memory consumption in these areas of the code can become unpredictable. Fortunately, using nested memory pools can be a great way to easily manage these potentially hairy situations. The following example demonstrates the basic use of nested pools in a situation that is fairly common—recursively crawling a directory tree, doing some task to each thing in the tree.
Example 8.5. Effective Pool Usage
/* Recursively crawl over DIRECTORY, adding the paths of all its file children to the FILES array, and doing some task to each path encountered. Use POOL for the all temporary allocations, and store the hash paths in the same pool as the hash itself is allocated in. */ static apr_status_t crawl_dir (apr_array_header_t *files, const char *directory, apr_pool_t *pool) { apr_pool_t *hash_pool = files->pool; /* array pool */ apr_pool_t *subpool = svn_pool_create (pool); /* iteration pool */ apr_dir_t *dir; apr_finfo_t finfo; apr_status_t apr_err; apr_int32_t flags = APR_FINFO_TYPE | APR_FINFO_NAME; apr_err = apr_dir_open (&dir, directory, pool); if (apr_err) return apr_err; /* Loop over the directory entries, clearing the subpool at the top of each iteration. */ for (apr_err = apr_dir_read (&finfo, flags, dir); apr_err == APR_SUCCESS; apr_err = apr_dir_read (&finfo, flags, dir)) { const char *child_path; /* Clear the per-iteration SUBPOOL. */ svn_pool_clear (subpool); /* Skip entries for "this dir" ('.') and its parent ('..'). */ if (finfo.filetype == APR_DIR) { if (finfo.name[0] == '.' && (finfo.name[1] == '\0' || (finfo.name[1] == '.' && finfo.name[2] == '\0'))) continue; } /* Build CHILD_PATH from DIRECTORY and FINFO.name. */ child_path = svn_path_join (directory, finfo.name, subpool); /* Do some task to this encountered path. */ do_some_task (child_path, subpool); /* Handle subdirectories by recursing into them, passing SUBPOOL as the pool for temporary allocations. */ if (finfo.filetype == APR_DIR) { apr_err = crawl_dir (files, child_path, subpool); if (apr_err) return apr_err; } /* Handle files by adding their paths to the FILES array. */ else if (finfo.filetype == APR_REG) { /* Copy the file's path into the FILES array's pool. */ child_path = apr_pstrdup (hash_pool, child_path); /* Add the path to the array. */ (*((const char **) apr_array_push (files))) = child_path; } } /* Destroy SUBPOOL. */ svn_pool_destroy (subpool); /* Check that the loop exited cleanly. */ if (apr_err) return apr_err; /* Yes, it exited cleanly, so close the dir. */ apr_err = apr_dir_close (dir); if (apr_err) return apr_err; return APR_SUCCESS; }
The previous example demonstrates effective pool usage in
both looping and recursive situations.
Each recursion begins by making a subpool of the pool passed to
the function. This subpool is used for the looping region, and
cleared with each iteration. The result is memory usage is
roughly proportional to the depth of the recursion, not to total
number of file and directories present as children of the
top-level directory. When the first call to this recursive
function finally finishes, there is actually very little data
stored in the pool that was passed to it. Now imagine the extra
complexity that would be present if this function had to
alloc()
and free()
every single piece of data used!
Pools might not be ideal for every application, but they are extremely useful in Subversion. As a Subversion developer, you'll need to grow comfortable with pools and how to wield them correctly. Memory usage bugs and bloating can be difficult to diagnose and fix regardless of the API, but the pool construct provided by APR has proven a tremendously convenient, time-saving bit of functionality.