"The code is exactly how I would like to write code and the algorithms used are very efficient and well-documented."

Van T. Dinh, Microsoft MVP




Class: Tokenizer in Category General VBA/VB6 : String Handling from Total Visual SourceBook

Breaking and parsing a string into individual tokens with VB6 and VBA.

This tokenizer is implemented as a state machine. The characters used for whitespace, separators, quotes, and end-of-line are all implemented as properties. This allows for a great deal of flexibility in how text is tokenized.

Procedure List

Procedure Name



(Declarations) Declarations Declarations and private variables for the CTokenizer class
ConvertCase Property Get the current type of case conversion. The ConvertCase property can have the following values: tokNone - No case conversion. tokUpper - Convert to uppercase. tokLower - Convert to lowercase.
EOLChars Property Get the characters currently treated as end of line characters
QuoteChars Property Get the characters currently treated as quotes
SeparatorChars Property Get the characters currently treated as separators
Text Property Get the text to tokenize
WhiteSpaceChars Property Get the characters currently treated as white space
Class_Initialize Initialize Initialize the class.
GetNextToken Method Get the next token in the string identified by the text property
AppendToken Private Append the current character to the token, performing any necessary case conversion.
CharType Private Determine the type of the current character.
HandleEOL Private Handle an end of line character.
HandleQuote Private Handle a quote character.
HandleSeparator Private Handle a separator character.
HandleToken Private Handle a token character.
HandleWhiteSpace Private Handle a whitespace character

Example Code for Using Class: Tokenizer

' Example of CTokenizer
' To try this example, do the following:
' 1. Create a new form
' 2. Add a command button named 'cmdTest'
' 3. Paste all the code from this example to the new form's module.
' 4. Run the form

Private Const mcstrText As String = "A-whop boppa lu-mop," & vbCrLf & "A whop bam boom"

Private Sub cmdTest_Click()

  ' Example for the CTokenizer class
  ' This example breaks a string up into its pieces and constructs a new string, separated by "."'s
  ' to display in a message box

  Dim tokenizer As CTokenizer
  Dim strBops As String
  Dim strTok As String

  Set tokenizer = New CTokenizer

  With tokenizer
    .Text = mcstrText
    .WhiteSpaceChars = " ,.:-"
    .ConvertCase = tokLower
    .EOLChars = vbCrLf
    Do While .GetNextToken(strTok, " ", False)
      strBops = strBops & "." & strTok

  End With

  Debug.Print strBops

End Sub

