BZWParser: the BZW text-parsing API
August 9, 2007 at 7:36 am | In Uncategorized | Leave a CommentThe power behind the DataEntry class and all derived classes is the ability to parse BZW-formatted text. Fortunately, BZWB employs an API for doing just that. Eventually, this class will become a separate library for any apps to use.
////////////////////////////////////////////////////////////////////////////////////
class BZWParser {
public:
// unused
BZWParser() { }
virtual ~BZWParser() { }
// init method to get Model reference
// this MUST be called BEFORE any other method
static void init( Model* model ) { _modelRef = model; }
// simplest method: get a value from a non-repeatable key in a line
static string value(const char* key, const char* text);
// simpler method: get the key from a line
static string key(const char* text);
// get the terminator token of a key
static string terminatorOf(const char* key);
// get the header of a chunk of BZW text
static string headerOf(const char* text);
// get the hierarchy of an object
static string hierarchyOf(const char* text);
// get the individual lines out of a section
static vector<string> getLines(const char* header, const char* section);
// get all lines in a section that begin with the given key
static vector<string> getLinesByKey(const char* key, const char* header, const char* section);
// chunk of a long string of text into an array of sections
static const vector<string> getSections(const char* text);
// find sections starting with a particular header from a vector of sections
// (meant to be used in conjunction with getSections()
static const vector<string> findSections(const char* header, vector<string>& sections);
// get all sections from a chunk of text that start with header
static const vector<string> getSectionsByHeader(const char* header, const char* text);
static const vector<string> getSectionsByHeader(const char* header, const char* text, const char* footer);
// use this for sections with subobjects
static const vector<string> getSectionsByHeader(const char* header, const char* text, const char* footer, const char* internalSectionKeys, const char* sectionsToIgnore);
// get all values from a key in a section
static vector<string> getValuesByKey(const char* key, const char* header, const char* section);
// get lines starting with a key from a set of keys keys in a section, preserving the order
static vector<string> getLinesByKeys(vector<string> keys, const char* header, const char* section);
// get all values from a line
static vector<string> getLineElements(const char* line);
static vector<string> getLineElements(const char* line, int count);
// the big tamale: the top-level file loader
static vector<string> loadFile(const char* filename);
private:
static Model* _modelRef;
};
///////////////////////////////////////////////////////////////////////
As you can see, all methods in this class are static. DataEntry includes the header file containing this class, so any derived classes can call these methods statically at any time. Let’s have a looksie:
There are no constructors for this class; its methods are designed to be accessed statically. Instead, however, the init() method must be called prior to any other methods. The parameter of init() is a pointer to a Model class, which among other things keeps a pseudo registry of objects and how they can be identified, as well as methods used by BZWParser to determine the footer line of an object from its header, to determine which if any sub-objects might appear, to determine hierarchies of possible collections of objects, etc. Basically, BZWParser is completely ignorant of the types of objects BZWB supports, and won’t work without access to a Model (this is intentional, because it allows BZWParser to work with ANY type of object, even ones in future releases).
key() and value() are the bread and butter of BZW parsing. key(), as the comment suggests, takes a BZW key/value pair (i.e. one line of BZW text) and returns the key. value() does the opposite: given a key (i.e. “position”, “rotation”, “size”, etc) and a line of BZW text, it will return the value paired with that key.
The methods terminatorOf(), headerOf(), and heirarchyOf() are dependent on the existence of a Model, because they determine the basic structure of a BZW object. terminatorOf() takes an object header such as “box”, “pyramid”, “teleporter”, etc. and returns the token that terminates the object. This is usually “end”, but in the case of “face” this will be “endface” and in the case of “define” this will be “enddef”. headerOf() takes a header line and returns the name of the object. This is useful in objects like groups and teleporters, where additional data is stored after the object name on the same line. heirarchyOf() is a more complex beast–it takes an object header and returns a BZWB-formatted string that describes the hierarchy of that object. Normally, this is just a zero-length string, because most objects have no sub-objects and thus no hierarchy. The two exceptions are “mesh” and “define”. The hierarchy string format is as follows:
<object_name:<subobject_1><subobject_2>…<subobject_n>><subobject_1:<subsubobject_1>…>
For example, the hierarchy of “mesh” is
<mesh:<face><drawinfo>><drawinfo:<lod>><lod:<matref>>
because the “mesh” object contains both “face” and “drawinfo” sub-objects, “drawinfo” conains the “lod” sub-object, and “lod” contains the “matref” sub-object.
The getLines(), getLinesByKey(), getValuesByKey(), and getLinesByKeys() are probably the most widely used methods in terms of number of calls in BZWB–they are used to extract values from lines. The getLines() method takes as its first argument an object header, and as its second argument a chunk of BZW text containing exactly one BZW object of that type (the string can contain additional objects, even additional objects of that type, but only the first occurrence will be used). It returns a vector of strings (in the order in which they appear), each of which contains a key/value pair from the object. If there were no lines to be parsed, it returns an empty vector.
getLinesByKey() is a more useful version of getLines()–it takes a key in addition to a header and a chunk of text, and returns all key/value pairs that contain the key in the order in which they appear. Like with getLines(), getLinesByKey() will only look at the first occurrence (if any) of an object with the given header. If no lines match the key, getLinesByKey() returns an empty vector.
getValuesByKey() is an even more powerful version of getLinesByKey()–instead of returning a vector of string/value pairs, it returns a vector of just the values that correspond to the given key in the order in which they appear.
getLinesByKeys() is also a more useful form of getLinesByKey(), but instead of passing a single key, you pass a vector of keys. getLinesByKeys() returns a vector of key/value pairs which contain one of the given keys. This method is extremely useful when parsing objects where the order in which multiple objects appear is relevant (i.e. in meshes, the order in which “matref” and “face” keys appear determines which faces receive which materials; the mesh class uses this method to determine which materials refer to which faces).
BZWParser provides the methods findSections(), getSections(), and three types of getSectionsByHeader() for parsing entire objects (sections) from a BZW file. findSections() takes an object header and a vector of strings, where each string contains a BZW-formatted object definition. It will return a vector of strings (where each string is a BZW-formatted object definition) which have headers that match the given header. This method isn’t usually called within a DataEntry-derived class, since DataEntry-derived classes almost always will deal with just one object definition.
getSections() is a more general form of find sections: it takes multi-object chunk of BZW text and breaks it up into a vector of strings containing individual objects. This is also rarely called from DataEntry-derived objects.
getSectionsByHeader(), however, is somewhat more important. The first incarnation takes an object header and a chunk of BZW-formatted text, and returns a vector of strings containing the BZW-formatted object definitions with the given header. This is called by every DataEntry-derived class during the update(const string&) method to strip out relevant object(s) from the given string. If no objects are found, a vector containing the pre-defined string BZW_NOT_FOUND will be returned. This method should NOT be used with objects that have sub-objects.
The second incarnation of getSectionsByHeader() is a bit more specific–it takes an object terminator (the footer line; usually “end”) along with a header and a chunk of BZW text. It does the exact same thing as the first incarnation; in fact, the first incarnation calls this method and uses it’s internal Model reference to look up the terminator using the given header.
The third incarnation of getSectionsByHeader() is the general case of the method–unlike the first two methods, this one handles objects with complex hierarchies. It is advisable to call the first two methods wherever appropriate, because they are much faster. Like the second incarnation, this method takes an object header, a chunk of BZW text, and a terminator, but also takes a BZWB-formatted keystring containing the headers of all DIRECT sub-objects you wish for the method to parse and a BZWB-formatted keystring containing the headers of any possible DIRECT sub-objects you wish to ignore. The format of these last two arguments is “<key1><key2>…<keyN>”. The purpose of telling this method to ignore certain sub-objects is important when a key can have multiple meanings depending on its location in the hierarchy (i.e. “matref” in “mesh” can either reference a material for faces or it can designate the start of low-level geometric primitives, depending on where it’s at), so as to only focus on occurrences of that key within a certain scope. Also, passing a list of keys to ignore allows you to use this method to look specifically at different scopes. For example, here’s an excerpt from mesh.cpp:
////////////////////////////////////////////////////////
int mesh::update(string& data) {
const char* header = this->getHeader().c_str();
// get lines
vector<string> lines = BZWParser::getSectionsByHeader(header, data.c_str(), “end”, “<drawinfo><face>”, “”);
// get the lines without the drawinfo (so we don’t mistake drawinfo’s data with the global data)
vector<string> linesNoDrawInfo = BZWParser::getSectionsByHeader(header, data.c_str(), “end”, “<drawinfo><face>”, “<drawinfo>”);
// get lines without faces and without drawinfo (so bz2object doesn’t think there are multiple physics drivers)
vector<string> linesNoSubobjects = BZWParser::getSectionsByHeader(header, data.c_str(), “end”, “<drawinfo><face>”, “<drawinfo><face>”);
// break if there are none
if(lines[0] == BZW_NOT_FOUND) {
printf(“mesh not found\n”);
return 0;
}
// break if too many
if(!hasOnlyOne(lines, header))
return 0;
// get the data
const char* meshData = lines[0].c_str();
// get drawinfo-less data
const char* meshDataNoDrawInfo = meshData;
if(linesNoDrawInfo.size() > 0)
meshDataNoDrawInfo = linesNoDrawInfo[0].c_str();
// get no-subobject data
const char* meshDataNoSubobjects = meshData;
if(linesNoSubobjects.size() > 0)
meshDataNoSubobjects = linesNoSubobjects[0].c_str();
// get the vertices
vector<string> vertexVals = BZWParser::getValuesByKey(“vertex”, header, meshDataNoDrawInfo);
// get the texcoords
vector<string> texCoordVals = BZWParser::getValuesByKey(“texcoord”, header, meshDataNoDrawInfo);
// get faces
vector<string> faceVals = BZWParser::getSectionsByHeader(“face”, meshDataNoDrawInfo, “endface”);
/* …some more code… */
// get drawinfo
vector<string> drawInfoVals = BZWParser::getSectionsByHeader(“drawinfo”, meshData);
if(drawInfoVals.size() > 1) {
printf(“mesh::update(): Error! Defined \”drawinfo\” %d times\n”, drawInfoVals.size());
return 0;
}
/* …some more code… */
// parse drawinfo (DrawInfo is considered to be a separate object in BZWB)
DrawInfo drawInfoParam = DrawInfo();
bool doDrawInfo = false;
if(drawInfoVals[0] != BZW_NOT_FOUND) {
drawInfoParam = DrawInfo(drawInfoVals[0]);
doDrawInfo = true;
}
/* more code */
}
/////////////////////////////////////////////////////////////////////////
As you can see, it uses a combination of getValuesByKey() and the incarnations of getSectionsByHeader() to parse out and distinguish between the drawinfo data, the face data, and the mesh data.
The getLineElements() methods are pretty easy to understand: they take a string and break it up into a vector of contiguous substrings (i.e. strings separated by spaces, tabs, \n, and \r). For example, the string “this is a sentence”, when fed into getLineElements(), will become <”this”, “is”, “a”, “sentence”>. The first version of getLineElements() returns all substrings in the line, the second returns either all of the substrings, or ‘count’ substrings, whichever is smaller.
The last method, loadFile(), takes a filename as its argument, and returns a vector of strings, where each string contains an object definition. This is usually not called by DataEntry objects.
Next time: the Renderable class. Stay tuned
No Comments Yet »
RSS feed for comments on this post. TrackBack URI
Leave a comment
Blog at WordPress.com. | Theme: Pool by Borja Fernandez.
Entries and comments feeds.