Malcolm W. Tester II wrote:
> is the type of person who codes it. And in general, it seems
> Americans are lazier about it. I am an American, so I have a
> right to call us lazy :) Both of the parsers in these muds were
> written by other nationalitys. One was by a German guy, the other
> was by a Swedish guy. Both are based off the original lp 2.4.5.
> Could they
Speaking of other languages, does anyone have experience with GOOD
parsers that work with languages whose grammar/origin is very
different from English? (Has anyone ever tried?)
For example: Japanese (3 character sets, no spaces, verb at end,
enter text with an IME), Chinese (2 character sets, no spaces, enter
text with an IME), Arabic (non-Roman character set, enter text with
an IME, ???), or even Finnish (which I've been told likes to combine
verbs and nouns, or something of the sort). The Inform designers
guide discusses porting to English's cousins like French and German,
but not the more distant language groups.
I posted on rec.arts.int-fiction and received a helpful response
about some of the oddities about Japanese. No one else seems to have
any info.
I am interested in localization, so one concern I have (but which
most people won't) is that if I spend a huge amount of time writing
an awesome english parser "aggressively attack the second orc to the
right of the tree" (to exaggerate) then:
a) No sane person will be able to localize the same functionality
into other languages. (I don't know enough other languages to do
localization myself.)
b) All the wonderful parsing will be unseen by most users because
they don't need it, even in an IF oriented enviornment. Users on
rec.arts.int-fiction don't seem too keen on going beyond anything
in TADs or Inform, which implies to me that maybe it's not
necessary.
I have been thinking about parsing lately. If you intend to write
one, here are some other things to think about:
a) You might eventually want to connect speech recognition to your
parser so users can speak the commands. For this to work you need
to compile your entire grammer into a CFG (context-free grammar)
(BNF format or whatever). If you start with a CFG model, speech
recognition will be easy to add at some point in the future. If
you use another approach then it will be very difficult to bolt
speech recognition on top.
b) It seems to me that the parser for commands is different than
the parser you'll need for modelling conversations with NPCs. I
haven't thought enough about this yet, but conversations will
probably need some sort of probabilitsic heuristic. "Tell me about
King Leopold?", "What do you know about the king?", "Who is lord
Leopold?", and "Does the local monarch have anything interesting
in his castle?" all mean the same thing to a NPC, but a CFG won't
do so well.
c) If you talk to a professional linguist they'll probably tell
you the first step is to determine each words' part-of-speech,
which is a non-trivial problem. I have some links for possible
solutions to this problem. Once you have POS you can generate a
parse tree based on the POS associations of the language (Adj
before N, in English, etc.). This then lets you identify
verb-phrase, subject, object, etc. You can also use a word-net
and/or thesaurus for synonyms. If you talk to a professional
linguist you'll spend the rest of your life writing your parser.
d) The more "correct" your parsing solution is, the more parts
you'll be able to use when going from concept to sentence, such as
verb/noun agreement in "<name> is too big to fit in your bag." If
<name> is "the piano" your text is ok, but if <name> is "the gold
ingots" you need to change "is" to "are". Other languages have it
far worse. The more NLP information your app has around, the
better it can resolve these issues.
Mike Rozak
http://www.mxac.com.au