6.2.2008
PARSE
 
CORE EXT
 
( char "ccc<char>" -- c-addr u )

Parse ccc delimited by the delimiter char.

c-addr is the address (within the input buffer) and u is the length of the parsed string. If the parse area was empty, the resulting string has a zero length.

Rationale:
Typical use: char PARSE ccc<char>

The traditional Forth word for parsing is WORD. PARSE solves the following problems with WORD:

  1. WORD always skips leading delimiters. This behavior is appropriate for use by the text interpreter, which looks for sequences of non-blank characters, but is inappropriate for use by words like ( , .(, and .". Consider the following (flawed) definition of .(:

       : .( [CHAR] ) WORD COUNT TYPE ; IMMEDIATE

    This works fine when used in a line like:

       .( HELLO)    5 .

    but consider what happens if the user enters an empty string:

       .( )    5 .

    The definition of .( shown above would treat the ) as a leading delimiter, skip it, and continue consuming characters until it located another ) that followed a non-) character, or until the parse area was empty. In the example shown, the 5 . would be treated as part of the string to be printed.

    With PARSE, we could write a correct definition of .(:

       : .( [CHAR] ) PARSE TYPE ; IMMEDIATE

    This definition avoids the "empty string" anomaly.

  2. WORD returns its result as a counted string. This has four bad effects:

    1. The characters accepted by WORD must be copied from the input buffer into a transient buffer, in order to make room for the count character that must be at the beginning of the counted string. The copy step is inefficient, compared to PARSE, which leaves the string in the input buffer and doesn't need to copy it anywhere.

    2. WORD must be careful not to store too many characters into the transient buffer, thus overwriting something beyond the end of the buffer. This adds to the overhead of the copy step. (WORD may have to scan a lot of characters before finding the trailing delimiter.)

    3. The count character limits the length of the string returned by WORD to 255 characters (longer strings can easily be stored in blocks!). This limitation does not exist for PARSE.

    4. The transient buffer is typically overwritten by the next use of WORD.

    The need for WORD has largely been eliminated by PARSE and PARSE-NAME. WORD is retained for backward compatibility.