Saturday, July 22, 2006

20Q




Semantic memory

Hierarchical model of semantic memory (Collins and Quillian, 1969), followed by most ontologies.

Connectionist spreading activation model (Collins and Loftus, 1975), with mostly lateral connections.

Our implementation is based on connectionist model, uses relational database and object access layer API.
The database stores three types of data:
concepts, or objects being described;
keywords (features of concepts extracted from data sources);
relations between them.

IS-A relation us used to build ontology tree, serving for activation spreading, i.e. features inheritance down the ontology tree.
Types of relations (like “x IS y”, or “x CAN DO y” etc.) may be defined when input data is read from dictionaries and ontologies.

WordNet hypernymic (a kind of … ) IS-A relation + Hyponym
and meronym relations between synsets (converted into concept/concept relations), combined with ConceptNet relation
such as: CapableOf, PropertyOf, PartOf, MadeOf ...
Relations added only if in both Wordnet and Conceptnet.

Drastic simplification: for some applications SM is used in a more efficient way using vector-based knowledge representation.
Merging all types of relations => the most general one:

“x IS RELATED TO y”, defining vector (semantic) space.

{Concept, relations} => Concept Description Vector, CDV.
Binary vector, shows which properties are related or have sense for a given concept (not the same as context vector).
Semantic memory => CDV matrix, very sparse, easy storage of large amounts of semantic data.

Search engines: {keywords} => concept descriptions (Web pages).
CDV enable efficient implementation of reversed queries:
find a unique subsets of properties for a given concept
or a class of concepts = concept higher in ontology.
What are the unique features of a sparrow? Proteoglycan? Neutrino?



The goal of the 20 question game is to guess a concept that
the opponent has in mind by asking appropriate questions.

www.20q.net has a version that is now implemented in some toys!
Based on concepts x question table T(C,Q) = usefulness of Q for C.
Learns T(C,Q) values, increasing after successful games, decreasing after lost games. Guess: distance-based.

SM does not assume fixed questions.
Use of CDV admits only simplest form “Is it related to X?”, or “Can it be associated with X?”, where X = concept stored in the SM.
Needs only to select a concept, not to build the whole question.

Once the keyword has been selected it is possible to use the full power of semantic memory to analyze the type of relations and ask more sophisticated questions.

Euclidean distance used for binary Yes/No answer,
otherwise the distance ||K–A|| is:

where |Ki–Ai| depends on the type of relation Ki and answer Ai:
- if either Ki or Ai is Unknown then |Ki–Ai|=0.5
- if either Ki or Ai is Not Applicable then |Ki–Ai|=1
otherwise Ki and Ai are assigned numerical values:
Yes=1, Sometimes = 2/3, Seldom = 1/3, No = 0

CDV matrix for a single ontology reduced to animal kingdom was initially used to avoid storage size problems.
The first few steps find keywords with IG≈1.
CDV vectors are too sparse, with 5-20, average 8, out of ~5000 keywords. In later stages IG is small, very few concepts eliminated.