October 1997
- Carnage, scripting newbie guides Koster, Raph
- Carnage, scripting newbie guides Nathan Yospe
- Carnage, scripting newbie guides Chris Gray
- Carnage, scripting newbie guides ##Make Nylander
- Carnage, scripting newbie guides ##Make Nylander
- Usability and interface and who the hell is supposed to clawrenc@cup.hp.com
- 101 Spells Not Worth Memorizing clawrenc@cup.hp.com
- more classes (Usability and interface and who the Brian Price
- more classes (Usability and interface and who the Matt Chatterley
- more classes (Usability and interface and who the coder@ibm.net
- Stranger in a Strange Land (was Usability and interface and Maddy
- Tablets. Ola Fosheim Grøstad
- Stranger in a Strange Land (was Usability and interface clawrenc@cup.hp.com
- Usability and interface ... Marian Griffith
- Usability and interface ... Caliban Tiresias Darklock
- Usability and interface ... Broly
- Usability and interface ... Caliban Tiresias Darklock
- Usability and interface ... Derrick Jones
- Usability and interface ... coder@ibm.net
- Usability and interface ... Derrick Jones
- Usability and interface ... coder@ibm.net
- Usability and interface ... coder@ibm.net
- Usability and interface ... Marian Griffith
- Turn-based Combat Jon A. Lambert
- Turn-based Combat Travis Casey
- Turn-based Combat John G.
- OT: I'm moving! coder@ibm.net
- (fwd) New mud release coder@ibm.net
- Riddles for games clawrenc@cup.hp.com
- Riddles for games Chris Gray
- Riddles for games coder@ibm.net
- The Trap Collection clawrenc@cup.hp.com
- Learning through failure Jon A. Lambert
- Learning through failure Maddy
- The Trap Collection - Volume II clawrenc@cup.hp.com
- THE COMPLETE GUIDE TO UNLAWFUL CARNAL KNOWLEDGE FOR FANTASY ROLE-PLAYING GAMES clawrenc@cup.hp.com
- (fwd) New MUD software wanted? coder@ibm.net
- (fwd) New MUD software wanted? Felix A. Croes
- (fwd) New MUD software wanted? coder@ibm.net
- META: File attachments as list postings. coder@ibm.net
- More Riddles... Jon A. Lambert
- More Riddles... Jon A. Lambert
- multiple intelligences Brandon J. Rickman
- multiple intelligences Travis Casey
- multiple intelligences Brandon J. Rickman
- multiple intelligences S001GMU@nova.wright.edu
- multiple intelligences Travis S. Casey
- multiple intelligences coder@ibm.net
- OT: Usability and interface and who the hell is suppo coder@ibm.net
- Fear of magic (was:Usability and interface) Derrick Jones
- Fear of magic (was:Usability and interface) Michael Hohensee
- Fear of magic (was:Usability and interface) coder@ibm.net
- The Official T$R Book of Adventure Suggestions coder@ibm.net
- Mud governance Koster, Raph
- Mud governance Felix A. Croes
- Mud governance Mike Sellers
- Mud governance Travis Casey
- Mud governance coder@ibm.net
- Mud governance Mike Sellers
- Mud governance coder@ibm.net
- Mud governance S001GMU@nova.wright.edu
- Mud governance coder@ibm.net
- Mud governance Koster, Raph
- Mud governance coder@ibm.net
- OT: Usability and interface and who the hell is su Jon A. Lambert
- Fear of magic (was:Usability and interface) Marian Griffith
- Fear of magic (was:Usability and interface) Derrick Jones
- Fear of magic (was:Usability and interface) coder@ibm.net
- Fear of magic (was:Usability and interface) Nathan Yospe
- Fear of magic (was:Usability and interface) Marian Griffith
- Fear of magic (was:Usability and interface) Sauron
- Fear of magic (was:Usability and interface) coder@ibm.net
- Fear of magic (was:Usability and interface) Marian Griffith
- Fear of magic (was:Usability and interface) coder@ibm.net
- Fear of magic (was:Usability and interface) Brandon J. Rickman
- Fear of magic (was:Usability and interface) Derrick Jones
- Fear of magic (was:Usability and interface) Jon A. Lambert
- Fear of magic (was:Usability and interface) Adam Wiggins
- Fear of magic (was:Usability and interface) Derrick Jones
- Fear of magic (was:Usability and interface) Derrick Jones
- Fear of magic (was:Usability and interface) Derrick Jones
- Fear of magic (was:Usability and interface) coder@ibm.net
- Fear of magic (was:Usability and interface) coder@ibm.net
- Fear of magic (was:Usability and interface) Sauron
- Fear of magic (was:Usability and interface) Marian Griffith
- Fear of magic (was:Usability and interface) Marian Griffith
- Fear of magic (was:Usability and interface) Jon A. Lambert
- Fear of magic (was:Usability and interface) coder@ibm.net
- Fear of magic (was:Usability and interface) Vadim Tkachenko
- Fear of magic (was:Usability and interface) Sauron
- Fear of magic (was:Usability and interface) Stephen Zepp
- Fear of magic (was:Usability and interface) Matt Chatterley
- Fear of magic (was:Usability and interface) Vadim Tkachenko
- Fear of magic (was:Usability and interface) Stephen Zepp
- Fear of magic (was:Usability and interface) coder@ibm.net
- Fear of magic (was:Usability and interface) Matt Chatterley
- Fear of magic (was:Usability and interface) coder@ibm.net
- Fear of magic (was:Usability and interface) Alex Oren
- Fear of magic (was:Usability and interface) Alex Oren
- Fear of magic (was:Usability and interface) Koster, Raph
- Fear of magic (was:Usability and interface) Chris Gray
- Fear of magic (was:Usability and interface) Richard Woolcock
- Fear of magic (was:Usability and interface) Stephen Zepp
- META: List burp coder@ibm.net
- To catch a mage (was fear of magic) Derrick Jones
- To catch a mage (was fear of magic) Matt Chatterley
- To catch a mage (was fear of magic) coder@ibm.net
- To catch a mage (was fear of magic) coder@ibm.net
- To catch a mage (was fear of magic) Derrick Jones
- To catch a mage (was fear of magic) coder@ibm.net
- To catch a mage (was fear of magic) Derrick Jones
- To catch a mage (was fear of magic) coder@ibm.net
- CODE RELEASE: [mush mux] Portable Space Engine v0.8.3 RELEASED! coder@ibm.net
- ANNOUNCEMENT: [graphical commercial] Mystic Realms coder@ibm.net
- CODE RELEASE: [server] New Mud Software (SunderMUD 1.0) coder@ibm.net
- string parsing Felix A. Croes
What follows is a description of something I am currently implementing
for my server. I am providing it here for others to comment on. Also,
I'd like to hear of other approaches to the same problem.
The notation for datatypes and code is from LPC. Note that `({ ... })'
is used to describe an array in LPC. Thus,
({ "foo", "bar", 47 })
describes an array of 3 items, the last of which is the number 47.
---
The problem: string parsing.
Rather than concentrating on a specific instance, such as parsing natural
language for the purpose of interpreting player commands, I have chosen
to provide a tool that is useful for implementing a natural language
parser. Another design goal was that it should be equally suitable for
parsing a programming language, and powerful enough to parse LPC itself.
Function:
parse_string(string grammar, string str)
Description:
Parse the string `str' according to the grammar. Return the parse tree
(or an array of parse trees).
The grammar consists of rules. There are two types of rules: token rules
and production rules. A token rule describes a token with a regular
expression. A production rule describes a production in the context-free
grammar for the language to be parsed. Thus, the parsing function can be
considered to incorporate the unix tools lex and yacc.
A token rule looks like this:
token = /regexp/
`token' is a symbol associated with a particular token in the grammar,
and the regular expression, delimited by `/' characters, describes the
form it takes in the input. More than one token rule may be specified
with the same token symbol. The special token symbol `whitespace' can
be used to describe token-separating whitespace.
In a regular expression, the characters `[ ]' enclose a subrange. For
instance, `[abc]' matches either of the characters a, b or c. A dash
can be used to indicate a subrange: `[0-9]' matches any digit.
The following characters have a special meaning in regular expressions:
? the preceding item is optional
* the preceding item matches zero or more times
+ the preceding item matches at least once
| either the preceding or the following item is matched
A production rule looks like any of the following:
prod: word
prod: foo
prod: ( bar )
prod: 'gnu'
prod: token prod foo ( bar 'gnu' )
prod:
prod: foo bar ? func
That is, on the right-hand side of a production rule can be any
combination of token symbols, production symbols, strings enclosed in
single quotes, and matched parenthesis enclosing at least one other
item. A production rule may optionally be followed by a question mark
and the name of an LPC function.
The first production rule defines the start symbol. Alternative
productions for the same symbol must be specified in different rules.
Henceforth, a token symbol will be called a `terminal' and a production
symbol will be called a `nonterminal'.
Parenthesis in production rules is used to structure the resulting
parse tree. For example, given the following grammar:
whitespace = / /
word = /[a-z]+/
Thing: Noun
Thing: ( Adjs ) Noun
Noun: word
Adjs: word Adjs
Adjs: word
parse_string(grammar, "bottle") will return
({ "bottle" }),
whereas parse_string(grammar, "big blue bottle") will return
({ ({ "big", "blue" }), "bottle" })
That is, the resulting parse tree is not structured according to
the production rules of the grammar, but as specified by parenthesis
only. Without parenthesis in any rules, a single flat array of
tokens is returned.
If a production rule is followed by a question mark and an LPC function
name, the function will be called with the intermediate parse tree
for the current rule, and the return value will replace the intermediate
parse tree in the overall return value.
The grammar may be ambiguous. Parsing is done in stages: first, all
possible parse trees for the input are generated, using Tomita's
breadth-first LR(k) parsing algorithm. Next, any LPC functions specified
for rules are called (bottom-up), and the intermediate parse trees for
those rules are replaced by the return values of those functions. As
a special case, if the LPC function returns zero, the entire parse tree
that this intermediate parse tree is part of is discarded. Finally,
the resulting parse tree(s) are returned.
NOTE: I have not yet decided how to handle more than one parse tree in
the result. When parsing a programming language, only one single valid
parse tree is desired. On the other hand, when parsing natural language,
ambiguities cannot always be resolved during syntax analysis. For
example, the sentence "kill the troll in the forest" can mean, "go to
the forest and kill the troll you find there" or "take this captive
troll to the forest and kill him". Even with a captive troll on hand,
the player should probably be notified about the ambiguity.
Example: command parsing
Grammar:
whitespace = / /
word = /[a-z]+/
Sentence: 'give' Object 'to' Living
Object: ( Article ) ( Adjs ) Noun ? find_obj
Living: ( Article ) ( Adjs ) Noun ? find_liv
Article: 'the'
Article: 'a'
Article: 'an'
Article: 'any'
Article:
Noun: word
Adjectives: word Adjectives
Adjectives:
Note that due to the structuring of the rule for the nonterminal
`Object', the LPC function find_obj() is always called with an
array of 3 items, the first of which is an array holding at most one
article, the second of which is an array of adjectives, and the third
of which is a noun.
If find_obj() and find_liv() both return matching objects, then parsing
the following sentence
"give long blue sword to the small dwarf"
will cause find_obj() to be called with
({ ({ }), ({ "long", "blue" }), "sword" })
and find_liv() with
({ ({ "the" }), ({ "small" }), "dwarf" })
and parse_string() will return the result
({ "give", OBJ(sword), "to", OBJ(dwarf) })
Felix Croes - string parsing Chris Gray
- string parsing Felix A. Croes
- string parsing Jon A. Lambert
- string parsing Felix A. Croes
- string parsing Chris Gray
- string parsing Felix A. Croes
- string parsing Chris Gray
- string parsing Felix A. Croes
- string parsing Chris Gray
- string parsing coder@ibm.net
- string parsing Felix A. Croes
- string parsing coder@ibm.net
- string parsing Chris Gray
- string parsing coder@ibm.net
- string parsing Chris Gray
- string parsing coder@ibm.net
- string parsing Jon A. Lambert
- string parsing Adam Wiggins
- string parsing Ola Fosheim Grøstad
- string parsing Chris Gray
- string parsing Felix A. Croes
- string parsing Nathan Yospe
- string parsing Felix A. Croes
- string parsing Nathan Yospe
- string parsing coder@ibm.net
- string parsing Chris Gray
- string parsing Nathan Yospe
- string parsing Chris Gray
- string parsing coder@ibm.net
- Idea: Hive-mind monster coder@ibm.net
- Idea: Hive-mind monster Adam Wiggins
- Idea: Hive-mind monster coder@ibm.net
- Idea: Hive-mind monster Sauron
- Idea: Hive-mind monster Derrick Jones
- Idea: Hive-mind monster Michael Hohensee
- Idea: Hive-mind monster Brandon J. Rickman
- Idea: Hive-mind monster coder@ibm.net
- Idea: Hive-mind monster Derrick Jones
- Idea: Hive-mind monster coder@ibm.net
- Idea: Hive-mind monster coder@ibm.net
- Idea: Hive-mind monster coder@ibm.net
- Skill Listing - Part II Jon A. Lambert
- Skill Listing - Part II Derrick Jones
- Skill Listing - Part I Jon A. Lambert
- Poison List - Part II Jon A. Lambert
- Poison List - Part III Jon A. Lambert
- Poison List - Part IV Jon A. Lambert
- Poison List - Part I Jon A. Lambert