Expandable
Expandable.h may be found in the Library. It provides a feature that I've implemented several times although this one is better that the past versions.
Templates are a fairly recent C++ invention (considering that programming languages started around the late 50s). The idea is simple: Write a description of an algorithm or group of algorithms that may be applied to any class of objects. An example might be a sorting algorithm. This is a single algorithm but it is complicated to get right. It would be nice to be able to do a qsort on an array of say, integers and strings and class objects. Sorting requires some sort of selection mechanism like "greater than" and an assignment operator like "=". But each of the objects suggested above have really different selection and assignment operators. So, the template allows the objects do define their own operations while the basic sorting algorithm does its work deciding how the items should be sorted.
In the C language arrays are statically defined. They are fixed in length. So the programmer is left with very little option but to make really large arrays to avoid overflow. Not a good choice. Expandable at a run time cost avoids that dilemma. The declaration for an expandable array is:
Expandable<String, 2> ex;
The idea of expandable is that the square bracket operator always finds a way to store or retrieve the content of the array. In the example above, the content is a simple string. The mechanisms are hidden from the user but there is a gotcha. The content of the array must be a class (or struct). The class must have a copy constructor and an assignment operator. In the above example, String is the class, so the following functions must appear in the String class:
class String {
Mumble mumble;
public:
String() : mumble(...) {...}
String(String& s) {copy(s);}
String& operator= (String& s) {copy(s); return *this;}
private:
void copy(String& s) {mumble = s.mumble;}
}
The expandable template takes care of all the details of managing the array. So when ex[0] is assigned string, it is stored. When ex[1] is assigned a string it is stored. However, when ex[2] is assigned a string there is no place to store it as only two elements were created with the declaration. So Expandable increases the size of the array. Of course it must copy all the existing elements to the new version of the array. That is one of the hidden costs mentioned earlier. After the copy of all the strings in the array the old array must be deleted (or a memory leak will occur), another cost.
There is one other problem with Expandable. In this example defines an array, "ex", which is composed of String objectsif is possible to reference an array element as follows:
String& s = ex[0];
ex[3] = _T("Nonsense");
s = _T("This assignment will likely cause an exception");
The reference s is to the first element of the array ex. So far the array ex doesn't look too interesting. The fourth element of the array (i.e. ex[3]) is set to "Nonsense" (the _T("") creates an Unicode string). Unfortunately it also requires an expansion of the array and a copy of the string in the first element of the array. So the reference in s is no longer valid and will likely cause an exception. That is because between the time the reference was created and the time of the assignment the vector was copied to a new location in the heap. This issue is dealt with better by the ExpandableP template.
Where it becomes interesting is when I try to put 100 strings into it. Behind the scene ex expands to accomodate the 100 strings. It does this by quietly increasing the size of the array as new elements are added to the array that exceed the current size of the array. It is still an array, one can qsort it, one can index into it, one can loop over all the elements forward, backward and randomly.
Well that solves the overflow problem of C style arrays. What is the cost of this? The cost includes the requirement that any entity (class or typedef) must have four specific operations. A class Data, e.g. class Data {...}, must have the following "methods" or functions:
- Constructor with no argument, e.g. Data(), which initializes all components of the class
- Copy Constructor, e.g. Data(const Data& d), which copies all components of object d to object *this
- Assignment operator, e.g. Data& operator= (Data& d), which copies all components of object d to object *this and returns *this
- Destructor, e.g. ~Data() that releases objects obtained from the heap and zeros all data components
Furthermore, the Expandable Template requires a "method" to compute the index each and every time so that overflow is detected and avoided. It avoids overflow by creating a new array block double in size of the current array block and copying all the data from the current block to the new block. Expensive, you say, yep. But bloody convenient to know that there can never be overflow! And it can be sorted too. Loops are easy, it is just an indexed array.
Is there any way to cut the overhead down? Yes, just make the expandable array contain pointers to the data. Then the only copying required is a pointer. For big complicated data structures that makes the cost quite reasonable. However, for simpler structures with better guesses about size, one can just put the structure in the array and be done with it.
ExpandableP
The idea of an array of pointers to cut the overhead of copying the data when expanding the array was enough incentive to create a template to do just that. But it turns out that having an array of pointers made the destructor a bit clumsy. So another template is required to manage the pointers:
class Datum {
private:
Key value;
String name;
public:
Datum() : value() { }
~Datum() { }
o o o
};
typedef DatumPtrT<Datum, Key> DatumP;
class MyClass {
ExpandableP<Datum, Key, DatumP, 2> data;
public:
MyClass() { }
~MyClass() { }
o o o
}
The Record Pointer (i.e. DatumP) defined by the DatumPtrT Template (i.e. DatumPtrT) does all the right things when the expandable array needs to be expanded. The actual data node (i.e. Datum) is just moved from one array to another during the expansion. Furthermore, creating a reference to an element in the array just creates a reference to the node in the heap which is never moved or changed.
The Datum pointer class created by DatumPtrT needs the type or class of the Key used in the find and binary search methods. The Key class may be a simple object like String or int or it may be a user define class with the appropriate boolean operations defined in any way that the user desires. If the find or binary search are not required then a sinple int as template argument is sufficient.
ExpandableP
ExpandableP includes a class Key used as a key object in the Datum object. In Expandable and ExpandableP Templates the key used for insertion sort is any single object in the Datum object. Usually it is a String which holds something like the path of a file or name of the object. ExpandableP allows one to have a more complex object as the key, for example a boolean and a string. Here is an example declaration:
class SiteFileDsc {
SiteFileKey key;
o o o
};
class SiteFileDscs;
typedef DatumPtrX<SiteFileDsc, SiteFileKey> SiteFileP;
typedef IterT<SiteFileDscs, SiteFileDsc> FileDscsIter; // Iterator Declaration
class SiteFileDscs {
o o o
ExpandablePX<SiteFileDsc, SiteFileKey, SiteFileP, 2> data; // List of all files
o o o
}
Available Operations on an Expandable, ExpandableP or ExpandableP Vector
When there is an Expandable vector, data, containing elements of class Datum
Expandable<Datum, 2> data;
Datum datum;
The following operations are available:
- datum = data[i]; -- where 0 <= i < endN
- data[i] = datum; -- array expands to encompass i
- data.clear(); -- content is ignored but number of elements is set to zero
- data = datum; -- datum is inserted (copied) into the sorted array at the correct position
- data = &datum; ">=" and "==" operators in datum must be defined
- data += datum; -- datum is appended (copied) to array (at a new last element)
-
Datum& d = data.nextData(); -- A reference to new last element of array is returned.
It may used as shown or immedialy with a dot operator or even as a target of - data.nextData() = _T("xxx"); -- an assignment (where a Datum operator= is defined)
- data(i, datum); -- datum is inserted at index i, the contents at i and above are moved up one element
-
data.del(i); -- The datum at index i is deleted and the elements above are moved
down to fill in
the hole. The number of elements in the array is reduced by one
Generally, the ExpandableP and ExpandableP templates have the same operations, except for some suble differences. The assignement operator allows the datum to decide where the datum is stored by the definition of ">=" and "==" operators.
- data = &datum; -- In ExpandableP it is assumed that datum has been allocated from the heap and a
pointer to it is placed in the array. In Expandable it is copied into the array. - data = datum; -- The datum is assumed to be local variable of some kind and so
it is copied into the
array. In ExpandableP a node is allocated prior to the copy operation.
The append operator "+=" places the node at the entry one element beyond the last entry in the array, i.e. appends.
- data += &datum; -- In ExpandableP the node is already allocated
- data += datum; -- In ExpandableP the node is allocated before the datum is copied into the array.
- Datum& d = data.getData(i); -- return a reference to a record at index i allocating a record if necessary
The function operator is used to "insert" a datum into an array at index i after moving other entries out of the way. No entries in the array before the insertion are lost.
- data(i, &datum); -- In ExpandableP the node is already allocated
- data(i, datum); -- The datum is copied into the array at index i
In ExpandableP nodes in the array may be allocated and deallocated explicitly with the following functions. They can be populated outside of the array and either added to the array later or sent back to the heap.
- Datum* d = data.allocate(); -- Allocate one record and return a pointer to it
- data.deallocate(&datum); -- Deallocate one record
When using ExpandableP, infrequently one must manipulate the actual pointer in the array that points to the datum.
- RcdPtr* rcdP = data.getRcdPtr(i); -- Returns a pointer to a RcdPtr class -- use in moderation
Insertion Sort
Suppose one needs the data to be sorted on the value of one of the components of the Datum class. One method of achieving that goal is to insert the datum using an insertion sort. Yep it is expensive because the elements need to be moved during the insertion. However, in small vectors this is not really noticeable and it is very convenient.
The Datum class (or struct) must have the following method:
- datum1 >= datum2
Here is an example:
struct Datum {
String name;
uint attr;
Datum() : attr(0) { }
Datum(Datum& d) {copy(d);}
~Datum() { }
Datum& operator= (Datum& d) {copy(d); return *this;}
bool operator>= (Datum& d) {return name >= d.name;}
private:
void copy(Datum& d) {name = d.name; attr = d.attr;}
};
Above when describing the assignment operator "data = datum" I was vague about where the insertion of the data would take place. When the operator ">=" is defined in the datum then the assignment operator on the data will result in an insertion sort.
Destructor
The Expandable and ExpandableP destructors free the allocated datafor which they are responisble for allocating. In addition they will call the destructor for each node in the active part of the array. So, ... if the nodes take care of freeing any data allocated during the execution of the application then there should be no memory lost. Big if, take care.
Iterators
One day I got tired of creating for loops for an expandable array. First one needs to find the current number of elements in the array. Then a loop is created with an index that ranges from zero to just less than the the number of elements. This two step process still left one with the job of indexing into the array to accomplish any task. By the way, I came on this idea by using one of the MFC facilities for accessing the Microsoft ACCESS database.
Anyway, this is certainly a simpler version of an iterator over a vector. I created a template to iterate over an expandable array provided there was one such array in the class. Let me digress for a moment to comment on the last sentence.
C++ classes are created for a variety of reasons. One reason may be to process and/or store some data. Rather than sticking a data structure in a global location it might be better to shield the data from the rest of the program (i.e. hide it). So stick the data in a class. Suppose one has two kinds of data that needs storing. Why not share the class? Well that violates the "object" prinicple. One data thingy plus all the operations one can think of on the data is all that should be in a single class, period. Doesn't mean only one variable, just one data thingy.
So for the most purposes when using an Expandable data structure there should only be one in the class. Then an iterator on that expandable data structure is fairly easy to implement with a friend class and some little private helper functions in the class with the expandable data structure. The implementation uses a template for creating the friendly iterator class and a cut and paste will add the private helper functions to the class with the expandable data structure.
The implementation details can be found in the Library, but for now I'll just show how to use it:
Datum* find(String name) {
MyDataIter iter(myObject); // where myObject is an object of MyClass
Datum* datum;
for (datum = iter(); datum; datum = iter++) {
if (datum->name == name) return datum;
}
return 0;
}
MyDataIter is either a class or a typedef and requires the name of the object to iterate over. Both iter() and iter++ return a pointer to a datum or return zero (i.e. null, NULL, 0). Zero is interpreted as the end of the array in the loop, i.e. false in the for conditional segment. So the body of the loop is assured of having a real "datum" pointed to by the datum variable.
The iterator is useful because the mechanism for controlling the point in the array to reference is contained in the iterator not in the object holding the data. So having two iterations over the same data are possible and they do not interfere with each other (unless there is a lot of adding and deleting happening).
The idea worked so well that for an array I added a reverse iteration (i.e. iter--)and deleting an object in the array and collapsing the array to fill in the hole.
Then I noticed that other data structures could be managed this same way. So I added an iterator with identical use (though not identical implementation) for the linked list in the library. When I come across other data structures that lend themselves to this approach I generally create a class that will contain the iteration mechanisms. Be aware though that adding and deleting may cause problems with iteration (even with just one iterator, think hard about those operations).
Expandable and ExpandableP have nearly identical data structures so an iterator template is possible with one hidden change in the class containing the expandable array. Here is how to define the iterator:
class Datum {
o o o
}
class MyObject;
typedef IterT<MyObject, Datum> MyDataIter;
class MyObject {
Expandable<Datum, 2> data;
o o o
public:
o o o
private:
// returns either a pointer to data (or datum) at index i in array or zero
// return &data[i] for Expandable or data[i].p for ExpandableP
Datum* datum(int i) {return 0 <= i && i < nData() ? &data[i] : 0;}
int nData() {return data.end();} // returns number of data items in array
friend typename MyDataIter;
}
Some Details Required to Use Expandable
This next section need not be read until one would like to use the implementation of one of the Expandable Templates.
Setting up an Expandable Array (Vector)
Expandable<Datum, 2> data;
or
typedef Expandable<Datum,2> Data;
where Datum is a class, e.g. class Datum {...}, and requires the following methods
- Constructor with no argument, e.g. Datum(), which initializes all components of the class
- Copy Constructor, e.g. Datum(const Datum& d), which copies all components of object d to object *this
- Assignment operator, e.g. Datum& operator= (Datum& d), which copies all components of object d to object *this and returns *this
- Destructor, e.g. ~Datum() that releases objects obtained from the heap and zeros (optionally) all data components
Here is an example of the minimum Datum class (or struct):
struct Datum {
String name;
uint attr;
Datum() : attr(0) { }
Datum(Datum& d) {copy(d);}
~Datum() { }
Datum& operator= (Datum& d) {copy(d); return *this;}
private:
void copy(Datum& d) {name = d.name; attr = d.attr;}
};
A Full Example of Datum consistent with Expandable
class Datum {
String name;
int attr;
public:
Datum() { } // Copy constructor: Datum a = b;
Datum(Datum& d) {copy(d);}
~Datum() { }
void add(String& stg, int a); // Parse the data into the record}
int display();
Datum& operator= (Datum& d) {copy(d); return *this;} // Copy operator: a = b;
bool operator>= (Datum& d) {return s >= d.s;} // For data = datum (Insertion Sort)
bool operator<= (Datum& d) {return s <= d.s;} // For qsort
bool operator> (Datum& d) {return s > d.s;} // For qsort
// The rest of the conditionals
bool operator== (Datum& d) {return s == d.s;}
bool operator!= (Datum& d) {return s != d.s;}
bool operator< (Datum& d) {return s < d.s;}
bool operator== (Key key) {return s == key;} // For Binary and Linear Search
bool operator< (Key key) {return s < key;} // For Binary Search
bool operator> (Key key) {return s > key;} // For Binary Search
private:
void copy(Datum& d) {name = d.name; attr = d.attr;}
};
The binary and linear search operators have an interesting property. The Key type may be reference to the actual key type. The Key Type may also be any type for which boolean operators are available for the Datum class. In this case, String has booleans where the second operator is an old fashion c string or in the case of UniCode a TCchar* type.