1. Introduction
C is widely used for implementing system and embedded software which are usually safety-critical systems. However, its manual memory-management can easily produce dangling pointers and memory leaks (ML) in C programs. Dereferencing and freeing a dangling pointer can cause Use-After-Free (UAF) and Double-Free (DF) memory errors, respectively.
Recently, an emerging programming language designed for highly safe systems, i.e., Rust
1, has received an increasing amount of attention. Dubbed a safer C, Rust combines memory safety and low-level control. It introduces ownership system (OWS) to provide memory safety at compile time. The basic idea of OWS is exclusive ownership, i.e., at any time each resource (mainly for memory) has a unique owner. Ownership of a resource can be transferred from old owner to new owner. To maintain a unique owner, the old owner becomes invalid, and it is no longer used. The above ownership constraints are useful for memory management. On the one hand, when the unique owner of a resource goes out of its scope, the resource can be automatically dropped without using the garbage collector. Because of the unique owner, this automatic drop scheme is safe, i.e., it does not incur side effects like UAF and DF. On the other hand, the ownership constraints rule out aliasing entirely, and consequently avoid dangling pointers at compile time.
In light of Rust’s promise of safety, a natural question arises about the possible benefits of exploiting ownership to ensure memory safety of C programs. Because OWS can prevent dangling pointers at compile time, therefore, in our previous work [
1], as shown in
Figure 1, we developed a formal ownership checker, named SafeOSL, to verify whether a C program satisfies ownership constraints. A C program that follows SafeOSL is a memory-safe program that satisfies the ownership constraints and is free of dangling pointers. For such a special C program, in this paper, we can implement an ownership-based memory deallocation, named SafeMD, to automatic drop resources. Like the automatic drop scheme in OWS, SafeMD is also safe and thus does not incur dangling pointers. Finally, the output of SafeMD is a memory-safe C program that is free of dangling pointers and memory leaks. Also, the patches generated by SafeMD make the input C program still satisfies ownership constraints. Usually, a C program that satisfies the ownership constraints is safer than its normal version.
Existing Techniques. Regarding the existing technologies for safe memory deallocation [
2,
3,
4,
5,
6], they suffer from some drawbacks: (1) Because most technologies do not consider whether their input C programs satisfy ownership constraints, they are complex as they often rely on alias and inter-procedural analysis to find the location of memory deallocation. (2) For the input C programs that satisfy ownership constraints, some technologies achieve low repairability due to their limit fixing strategies. For example, consider the C code satisfying ownership constraints in Listing 1, MemFix [
5] will insert
free(q) and
free(p) at line 19 and 20 to free memory objects, but it may introduce DF (for
) at line 20 and remain a leaky object
. In this case, to safely fix memory leaks, the correct patch
if(flag == 1) free(q); is required at line 19. But MemFix fails to fix because it is unable to generate patches with conditional statements. Again, consider the simplified C code that satisfies ownership constraints in Listing 2, where the function
f returns 0 with and without memory leaks. Saver [
6] will insert a conditional patch at line 17, but this fix partially eliminates the memory leaks and some leaks still remain. (3) For the input C programs that satisfy ownership constraints, although some technologies can fix their leaks, the patches they generated can not make the input C programs satisfy ownership constraints. Usually, a C program that satisfies the ownership constraints is safer than its normal version.
|
Listing 1: MemFix-generated patch |
|
Listing 2: Saver-generated patch |
Our Approach. In consideration of the above limitation, in this paper, we present SafeMD, an ownership-based safe memory deallocation designed for C programs that satisfy ownership constraints. Benefitting from the C programs satisfying ownership constraints, that is, each object has a unique pointer (owner) at any time, and ownership of object can be transferred in function calls and this designates which function is responsible for deallocating memory objects. This can make SafeMD obviate alias and inter-procedural analysis during the finding of fixing patches. Thus, SafeMD can fix more memory-leak pattern because it fixes memory-leaks only by inserting deallocation statements without combining other expressions such as conditional expressions. For example, for the code in Listing 1, SafeMD separately analyzes function foo1 and main, and generates deallocation statement free(m) and free(q) at line 5 and 19, respectively. In true branch of foo1, SafeMD considers the ownership hold by code and concludes that foo1 is responsible for freeing the objcet pointed to by m. On the contrary, MemFix tries to free this objcet in main function, but it fails since it can not generate conditional patches. Although Saver can fix leaks in this code with conditional patches, but the input C program repaired by Saver does not satisfy ownership constraints. Similarly, for the code in Listing 2, SafeMD does not perform inter-procedural analysis to free objects, so it does not depend on the return values of the function to generate patches. Instead, SafeMD will generate the patch free(q) at line 3 and 7 in function f. This deallocation is safe since p is no longer used after line 15 in function g. The reason why SafeMD and Saver generate different patches is that SafeMD believes that function f, not g is responsible for deallocating the object.
|
Listing 3: buggy code with memory-leaks |
MemFix [
5] is an automated technique for fixing memory deallocation errors in C programs. It aims to fix ML by finding a set of free-statements that correctly deallocate all allocated objects without causing DF and UAF. In this paper, we present SafeMD, which modifies the algorithm of MemFix to serve only for C programs that satisfy ownership constraints.
Review of the Ideas of MemFix. Before explaining our algorithm, let us first explain the ideas of MemFix. Given a buggy program where all deallocation statements are removed before analysis, MemFix works with two steps: 1) using a static analysis that collects patch candidates for each allocated object, and 2) finding correct patches by solving an exact cover problem. Before these two steps, MemFix need to run standard alias analyses to get may-alias and must-alias information.
Consider the simple example in Listing 3, MemFix first analyzes the code to collect patch candidates based on control-flow graph of code, as presented in step 1 of
Figure 2. MemFix uses a state of the form
to maintain points-to and patch information for each allocated object, where
o is an object represented by its allocation-site,
is a set of pointers that must point to
o,
is a set of pointers that definitely do not point to
o,
is a set of patches that are guaranteed to safely deallocate
o, and
is a set of potentially unsafe patches that may cause UAF or DF. Each patch in
and
is a pair
of a program location
n and a pointer expression
e, which denotes a deallocation statement
free(e) can be inserted right after line
n.
The analysis of MemFix is flow-sensitive, and thus it updates the states for each statement until reaching the exit of code. For example, the allocation statement at line 1 creates a new state, where object is pointed to by p, and it can be deallocated safely by inserting free(p) at line 2 (after line 1). Line 3 also creates a new state for , which may change the state of . Because q definitely no longer point to , q is added to of . Also, of includes and since it is safe to deallocate after line 1 or 3 via free(p). At line 7, the state of is updated to . Due to the assignment q = p, both q and p point to . The object can be safely deallocated with either free(p) or free(q) after line 7. The analysis of MemFix is disjunctive and maintains states separately for each different branch, therefore, for the joint point at line 9, the state , and are changed to , and , respectively. Because q is used at line 9, the existing for the objects pointed to by q are no longer safe due to use-after-free errors. Thus, MemFix removes the patches from and instead adds them to . For example, the patch is removed from ’s to ’s . Also, in the new safe patch becomes . However, the of remains safe because q does not point to and therefore the corresponding object is not be used at line 9. Instead, a new safe patch is added to , yielding .
After analysis in step 1, MemFix has collected all patch candidates for each allocated object. To find a set of patches from patch candidates that correctly deallocate all objects without introducing DF and UAF, MemFix solves an exact cover problem to find such correct patches, as shown in step 2 of
Figure 2. From the states
,
and
, the analysis first collects candidate patches that are included in
set but not in
:
For example,
is in
of
while in
of
, so it is excluded from
. The patches in
can not cause UAF, but may cause DF because of aliases. Therefore, MemFix finds a subset of the candidate patches that does not introduce DF, which corresponds to solving an exact cover problem represented by the following incidence matrix:
|
|
|
|
|
0 |
1 |
0 |
|
0 |
1 |
1 |
|
1 |
0 |
1 |
Each row
r in matrix represents a patch in
and each column
represents a state. The entry in row
r and column
is 1 if patch
r is included in
of state
and 0 otherwise. For example,
contains
in
, so the entry in row
and column
is 1. Solving an exact cover problem represented by the above incidence matrix is the selection of rows such that each column contains only single 1 among selected rows. In this example, the correct patches are computed as
which covers all states (i.e., no ML) and each state is covered by at most one patch (i.e., no DF).
Through the above analysis, MemFix will insert free(p) after line 3 and free(q) after line 9 to safely deallocate all allocated objects in code.
Our Proposal in This Paper. We modify the algorithm of MemFix to deallocate the objects in C programs that satisfy ownership constraints. SafeMD also contains the same two steps as MemFix. Its analysis results are presented in
Figure 3, where the results different from MemFix are highlighted in bold. This code can be considered to satisfy ownership constraints, because
p is no longer in use after line 7, which follows the constraint that old owners can not be accessed after ownership transferring and thus satisfies the exclusive ownership. Therefore, SafeMD does not need alias analyses before analysis. In step 1 of collecting patch candidates, because the code satisfies ownership constraints, the key idea for deallocation is that an object is deallocated only by its unique owner (rather than together with old owners), which is different with MemFix who will deallocate an object via all must-alias (may contain old owners). Based on the above idea, we use the state of the form
to maintain owner and patch information for each allocated object, where
is a pointer who is the unique owner of
o and
is a set of pointers that are the old owners of
o.
SafeMD will first update the owner information and then patch information. The allocation at line 1 creates a new state where the ownership of object is bound to p, so the and of is p and empty, respectively. The safe patch indicates that we can safely free via owner p after line 1. The pointer assignment at line 3 creates a new ownership binding for , but it does not change the owner for . The assignment at line 7 can transfer the ownership of from p to q, making q become new owner while p become old owner. The only safe patch for is , all other patches via old owners, namely, and are considered to be unsafe patches. This patch update is different from MemFix who tries to free via all must-alias (p and q are must-alias). At line 9, the assignment uses the object pointed to by q , but it does not transfer ownership of any object, therefore, owner information of each state remains the same. For each state, a new safe patch , where , is added to . For example, is generated in . However, because q is used at line 9, we remove and from of and , respectively, and declare them unsafe in and . More importantly, for the state whose is not empty, we generate a new unsafe patch for old owner p, because an object can not be deallocated by old owners.
In step 2 of finding correct patches that do not cause DF, the size of collected by SafeMD is smaller than that collected by MemFix. This is because SafeMD adds all patches related with old owners (aliases) to in step 1, which excludes a plenty of unsafe patches that may cause DF due to pointer aliasing. For example, will be added in at line 9 and thus excluded from . This can reduce the search space for solving exact cover problem.
We implement SafeMD based on the ideas described above. It has distinct features as follows:
SafeMD only works for C programs that satisfy ownership constraints.
If C programs satisfy ownership constraints, SafeMD obviates alias and inter-procedural analysis compared with most existing techniques on memory deallocation.
If C programs satisfy ownership constraints, SafeMD can safely deallocate objects only by inserting free statements instead of conditional patches (e.g., MemFix and Saver).
If C programs satisfy ownership constraints, SafeMD deallocates objects only via its unique owner and thus generates a smaller search space for correct patches compared with existing techniques (e.g, MemFix).
If C programs satisfy ownership constraints, the patches generated by SafeMD make the input C programs still satisfy ownership constraints. A C program that satisfies the ownership constraints is safer than its normal version.
This paper makes the following contributions:
We present SafeMD, an ownership-based safe memory deallocation technique for C programs that satisfy ownership constraints.
We implement SafeMD and conduct experiments to demonstrate its effectiveness.
We explore the benefit of Rust’s novel ownership-based memory management in C.
The rest of this paper is organized as follows.
Section 2 introduces the ownership system in Rust.
Section 3 presents the algorithm of SafeMD.
Section 4 reports experimental results.
Section 5 discusses related work and
Section 6 concludes the paper.
2. Rust Ownership System
Compared with C, the distinctive advantage of Rust is to introduce ownership system and consequently guarantee memory safety at compile time. Ownership in Rust denotes a set of rules that govern how the Rust compiler manages memory. The idea of OWS is exclusive ownership, which means each resource has a unique variable as its owner at any time. Ownership can be transferred among owners. When the owner of a resource goes out of its scope, the resource can be automatically dropped without using the garbage collector. Below, we introduce ownership transferring.
Ownership and Assignments. In Rust, ownership can be transferred in assignments. Consider the code in Listing 4, line 2 creates a String object o on the heap and let be the owner of o. At line 4, the assignment transfers the ownership of o from to , making is no longer valid until it is re-assigned a value again. Therefore, Rust compiler will issue an error at Line 5. This is different with pointer assignments in C where both and is valid. The above kind of transferring is called move. Because is the unique owner of o, when the owner goes out of its scope, Rust compiler automatically inserts drop destructor to free o at line 6. Therefore, the code at line 7 is rejected as o has been already destroyed. Now, we take a closer look at line 8. When goes out of its scope and tries to free o, Rust compiler does not insert drop since it finds the ownership of has been moved. This can ensure memory deallocation does not introduce DF.
|
Listing 4: Transferring ownership in assignments |
Ownership and Functions. Ownership can also be transferred in function calls. When ownership of an object is moved to a callee via parameters, this object is no longer available in the caller. For example, in Listing 5, the function call at Line 3 moves the ownership of the object o created at line 2 to takes_ownership, so Rust compiler will issue an error at line 4 where s becomes old owner and can not be accessed in main function. Rust compiler compiles each function individually, it will rely on ownership to determine whether to insert drop destructors to free objects. For example, Rust compiler first compiles main function. It finds that s is moved to takes_ownership and thus it does not insert drop to free o at line 5. Next, Rust compiler compiles takes_ownership function. Because is a String type which can move ownership, therefore, Rust compiler will automatically insert drop once goes out of its scope at line 9.
|
Listing 5: Transferring ownership via parameters |
|
Listing 6: Transferring ownership via return values |
Besides parameter passing, return values can also transfer ownership. For example, in Listing 6, gives_back moves ownership out from gives_back via return value to its caller main. In this case, exclusive ownership still ensures automatic memory deallocation is safe. When Rust compiler compiles main function, the object o is dropped automatically once goes out of scope at line 6 but nothing happens for s because s is moved. This avoids DF when s and go out of their scope (line 6) and both try to free the object o. When compiling gives_back function, it fails to insert drop to free the object pointed to by since is moved out from gives_back. This can avoid UAF if is freed while is used in main function.
5. Related Work
Memory-leak fixing techniques. We focus on the static analysis techniques for C programs. Because these work targets general C programs, they often rely on alias and inter-procedural analysis to aid in fixing memory leaks. LeakFix [
2] can identify and safely fix memory leaks. It uses pointer analysis, which is an inter-procedural, flow-insensitive, context-sensitive, to get an SSA-based points-to graph for each procedure. Also, it performs an inter-procedural analysis to identify which procedure should fix the leaked objects. AutoFix [
3] combines static analysis with runtime checking to prevent memory leaks. In its implementation of static analysis, Andersen’s pointer analysis is used to build the value-flow graph (VFG) for the program. FootPatch [
4] can fix memory leaks in large code bases but may introduce new errors such as DF as a side-effect. Memfix [
5] can safely repair ML, DF and UAF in a unified fashion. It removes memory errors by collecting patch candidates and solving an exact cover problem. Before analysis, Memfix performs standard pointer and alias analyses. Saver [
6] proposes object flow graphs (OFG), which captures the program’s heap-related behavior, to safely fix memory errors such as ML, UAF and DF. Pointer analysis is used to construct OFG. A framework that is proposed in [
7] presents practical solutions for developers to automate C program repair. Before generating patches, it uses program analysis, including static analysis and code instrumentation techniques, to extract patch conditions that can guide the generation of a correct patch.
For the C programs that satisfy exclusive ownership, the work mentioned above still performs alias and inter-procedural analysis to fix memory leaks, making the repair complex and inefficiency. However, the ownership constraints satisfied by input programs can ease memory-leak fixing. Compared with the work mentioned above, SafeMD obviates alias and inter-procedural analysis for safe memory deallocation. Besides, the patches generated by SafeMD satisfy ownership constraints. A C program that satisfies the ownership constraints is safer than its normal version.
Ownership Discussion. Ownership has been used in OO programming to enable controlled aliasing [
8,
9] and prevent data races [
10,
11]. Similarly, the concept of ownership has also been applied to analyse C/C++ programs. Heine et al. [
12] presents an ownership type system to detect ML and DF. Their ownerships range over integer values
. However, their model adds optional ownership transfer in assignment, allows arbitrary aliases, and thus can not detect UAF errors. Kohei et al. [
13] proposes a fractional-ownership type system to detect ML, DF, and UAF in C. Their model augments a pointer type with a fractional ownership, which is a rational number
. Tatsuya et al. [
14] extends the fractional-ownership type system in [
13] to fix memory leaks. Their technique conducts type inference for the extended type system to detect where to insert deallocation statements. Compared with these work, the goal of SafeMD is to fix memory leaks, instead of detecting them. Besides, SafeMD takes advantage of Rust’s ownership, called borrow-based ownership, not fractional-based ownership. Borrow-based ownership of Rust has been proven to be effective in preventing memory errors [
15,
16]. Inversely, fractional-based ownership is not implemented as a language feature like Rust’s ownership; it’s more of a design concept. Also, it allows multiple owners by assigning fractions of ownership and thus require more complex reasoning about ownership.
Rust. A majority of existing work towards Rust mainly focuses on formal verification of Rust programs [
17,
18,
19], empirical research on effectiveness of Rust in fighting against memory bugs [
16,
20] and bug detection of Rust programs [
21,
22]. Recently, some work on (semi-)automatically translating C code to Rust are proposed [
23,
24,
25]. Their idea is to first translate C programs to Rust programs using c2rust
4 (only grammatical transformation) and then remove unsafe features in translated Rust programs, e.g., translating unsafe pointers to safe Rust references. Although our work, i.e., SafeOSL and SafeMD, is not translating C to Rust directly, both of our work and their work leverage ownership mechanism of Rust to generate a safer C program. We believe our work can be used in conjunction with their approach to ensure memory safety in C programs.