For a few years I thought I understood what a character was in the bit world. My first brush with characters was a 6 bit character, then I was asked to use to program on a SDS 940, atime sharing machine, with a 24 bit word size and a 7 bit character size. Then someone figured out that powers of 2 were better word sizes and the character became 8 bits. Later, other spoken languages needed more bits to represent the characters in the alphabet and the Multibyte characters were invented. One of those representations is Unicode, which is 16 bits per character.
Character size is of little importance except in files and scratchpads and other miscellanous places where an eight bit character is expected. Fortunately it is fairly easily to translate from one form to another. One just has to discover when the translation is required.
SNOBOL ("StriNg Oriented and symBOlic Language") is a series of computer programming languages developed between 1962 and 1967 at AT&T Bell Laboratories by David J. Farber, Ralph E. Griswold and Ivan P. Polonsky, culminating in SNOBOL4. Thats were I first encountered strings. The language taught one that a string is an entity that can be manipulated in a variety of wonderful ways.
So, since one could not write in SNOBOL for a variety of reasons (lacked a view of the outside world being one), one needed to add string features to each programming language. My first real effort was in QSPL, a language on the SDS 940. A bunch of functions operated on a four pointer data structure which pointed to the beginning and end of the block of storage used store the characters (both dynamic and static beginning and end).
Then there were a few years programming in Assemply Language, which with Macros I transformed into A Macro Language (AML). I am happy to report that I've forgotten all the details of that language.
Next is C and C++, MFC and the myriad of string modules in Visual Studio C++/MFC/OLE/Net, etc. I actually don't know how many versions of string classes exist in Visual Studio C++ libraries.
Strings are important in man/machine interfaces. The traditional C style string was a pointer to a block of characters terminated with a zero character. The difficulty with this data structure is that the block was fixed size either on the stack, in global memory or in the heap. Overflow of the block is a persistant security problem.
The solution is a string that manages its storage invisibly. The MFC "string" class based on the "basic_string" class does exactly that. Furthermore, when compiling with ANSI characters the string class manages 8 bit characters, i.e. char. When compiling with Unicode characters the string class manages 16 bit characters, i.e. wchar_t. Seems like a natural fit.
But wait, CStrings are used in dialog boxes and in other places. Investigating CStrings yields the conclusion that they do not make a very good general purpose string as there are a not of useful operations defined for them when compared to the string class. Other useful string classes include the bstr_t and the variant_t class which can contain a bstr_t. The set of operations on a string needed enhancement so the String class was constructed. It is a subclass of the string class. Many operations are defined on a String including but not limited to the operations on the string s. Furthermore, the CString class needed some additional operations so the Cstring was created too.
When compileing with the Unicode option, strings use Unicode characters. A magic header file tchar.h allows both ANSI and Unicode compilation of the same file with the appropriate character being used. But there are times when compiling with Unicode that a ANSI character string is needed (and vice versa). Two little classes provide for this translation: ToAnsi and ToUnicode. All they do is do the conversion in to a private buffer during initialization of an object of the class with a function call of the object (ToAnsi toAnsi("_T("abc")); char* p = toAnsi();). P in the example is a pointer to an ANSI string until the function (or block) is exited.
The String Package including String, Cstring, ToAnsi and ToUnicode may be found in the library in the Strings.h and Strings.cpp files.