文档库 最新最全的文档下载
当前位置:文档库 › Using the WordNet Concept Catalog and a Relation Hierarchy for Knowledge Acquisition

Using the WordNet Concept Catalog and a Relation Hierarchy for Knowledge Acquisition

Using the WordNet Concept Catalog and a Relation Hierarchy for Knowledge Acquisition
Using the WordNet Concept Catalog and a Relation Hierarchy for Knowledge Acquisition

Using the WordNet Concept Catalog and a Relation Hierarchy

for Knowledge Acquisition

Philippe MARTIN

INRIA - ACACIA project

2004, route des Lucioles - BP 93 - 06902 Sophia Antipolis Cedex France

Phone: (33) 93.65.76.45 Fax: (33) 93.65.77.83 - E-mail:phmartin@sophia.inria.fr

Abstract. In order to guide a knowledge engineer in the design of ontologies, CGKAT (Martin, 1995), our knowledge acquisition tool, exploits the terminological knowledge base WordNet. Since the top-level concept types of this large general ontology are poorly struc-tured, we subordinated/merged them into an extension of the top-level ontology adviced by Sowa (1992). The result is proposed in the initial concept type lattice of CGKAT. Since WordNet is an on-line system, only the WordNet top-level concept types need to be included in the lattice: CGKAT enables the user to search the WordNet ontology by browsing or lexical search, and dynamically places the retrieved concept types in the lattice. Any part of the lattice may be reorganized without loosing access to the WordNet ontology. Thus, only the WordNet part that is useful for an application has to be de?nitely included in the lattice. This part, which is very detailed, eases interpretation, validation, reuse and automatic inferences on knowledge of the application. In this article, we detail the mechanism of dynamic inclusion of the WordNet ontology in the lattice, and its interests for knowledge representation and reuse, and we give the rationales behind our top-level ontology. Lastly, we sum up the interests of the use of basic relations for KA (in (Martin, 1995) we have already presented a top-level ontology for 200 basic relation types gathered from the hypertext and knowledge representation literature).

1 Introduction

Buiding an ontology, that is a taxonomic catalogue of concept types and relation types, is a dif?cult, long and crucial part of the Knowledge Acquisition (KA) process. Most of the natural language concepts and relations may appear in documents which are sources of expertise (technical documents, interview retranscriptions, etc.) and a great number of these concepts might have to be clusterized or classi?ed for the KA process. In this article, we give our approach to use the WordNet ontology (Miller & al., 1990) (which now organizes more than 91,000 concepts) in order to ease the building of the concept type lattice of an application, and its ulterior extensions or merging with other ontologies. We also sum up some interests of basic relations for KA (in (Martin, 1995) we have proposed guidelines for buiding the relation type hierarchy, and presented the top-level relation types of our richly organized set of about 200 basic relation types gathered from the hypertext and knowledge representation literature). Our work is implemented above CoGITo (Haemmerlé, 1995), a workbench for Conceptual Graphs (Sowa, 1984).

WordNet is a public domain on-line lexical reference system developed at Princeton University. Its design is inspired by current psycholinguistic theories of human lexical memory. In version 1.5, some 120,000 word forms of English nouns, verbs, adjectives and adverbs are organized into approximately 91,600 synonym sets (synsets), each representing an underlying lexical concept, that is, a word meaning (each synset is unique, all synsets are disjoint). Thus, given a lexical entry (a word or an expression), WordNet can extract its root (its word form) and give back its various meanings, that is, a list of synsets. The WordNet synsets are connected by semantic relations (e.g. IsA, Part-of, Cause-of, Attribute) and by

lexical relations (e.g. Antonym).

We have found a way to automatically build a distinct concept type name with the names in a synset. Then, since the WordNet database is now accessible by a C functional interface, CGKAT (Martin, 1995), our KA tool, can exploit the WordNet database for helping its users to build a concept type lattice: it can search a concept type in WordNet with the exact name of this concept or with any word or expression which refers to it, and it can follow the semantic and lexical relations from a retrieved WordNet concept type.

A problem for KA, knowledge representation (KR), and knowledge inferencing, is that the WordNet concepts are not organized under a "genuine" ontology model (like for example the Situation Data Model of Tepfenhart (1992)): the concepts which nouns refer to are structured under ten exclusive concept categories, the concepts which verb refer to are structured under a very long (and ungiven) list of activities, and the concepts which adjectives and adverbs refer to are not structured by an IsA relation. Therefore, we have manually subordinated and merged the WordNet top-level concepts, and other ontological distinctions from the Situation Data Model, the PENMAN Upper Model (Bateman 1990), CYC (Lenat & Guha, 1990), Esch (1992) and Pfeiffer & Hartley (1992), into an extension of the situational ontology model proposed by Sowa (1992). (Knight & Luk (1994) have done a similar work on WordNet but with the PENMAN and ONTOS (Carlson & Nirenburg, 1990) top-level ontologies).

Since the WordNet top-level concept types are in the lattice, the whole WordNet ontology has not to be included: when the user browses on links between types or when he searches types with a lexical entry, CGKAT can dynamically add in the lattice the retrieved types (with all their WordNet supertypes which are not yet included in the lattice). The user may decide to keep in the lattice the types he really needs for his application (the other types are removed from the lattice when they are no more needed for display). He may specialize or restructure any part of the lattice without loosing access to the WordNet ontology, since the above search&add mechanism is independent of the lattice structuration. To sum up, let us say that for guiding the user in the building of its ontology, the WordNet ontology need not to be wholly included in the lattice but just dynamically included.

Using WordNet and our top-level ontology, the knowledge engineer does not have to worry about a coherent ontology model, nor how to place and organize in a coherent way its natural language concept types under this model. WordNet provides an organization for most of the natural language concepts, and we have done the hard work which is to merge and extend the Sowa situational ontology model with all the WordNet top-level concept types. Afterwards, the knowledge engineer has just to ?nd WordNet concept types for the meanings he wants to express (e.g. with a lexical entry) and specialize these types for introducing application speci?c concept types or for expressing restrictions. Thus, he will build a safer, less brittle, more standard and more easily extensible ontology than without any guide (and above all, he will do it easily). His ontology will also be more comprehensible given that WordNet concept types are very structured, and have precise and detailed names and comments.

We will develop the interests of using WordNet for knowledge reuse later. First, we present the CGKAT interfaces which enables to view and browse large ontologies, including the WordNet one, using what we called a ?dynamic? inclusion. Then, we give the rationales behind the top-level structuration proposed by CGKAT for concept types. Lastly, we will quickly present some interests of basic relations for KA and KR.

2 Handling ontologies in CGKAT

Before explaining the dynamic inclusion of WordNet in the concept type lattice, and the top-level ontology we propose for this lattice, we have to explain our conventions in the display of ontologies.

Figure 1:The concept type handling menu showing some of the top-level types proposed by CGKAT.

Figure 1 shows the top of the default concept type lattice proposed by CGKAT. Since the CGKAT browsers must display a lot of types (with their associated comment) in order to give the user a synthetic view of an ontology part and therefore to ease its searchs, we have prefered an hierarchically indented list instead of a graph layout. When a concept type is selected, its supertypes and its subtypes are displayed (more precisely, ?ve supertypes are displayed whereas the depth for the subtypes is controlled by content of the ?Max depth? number entry widget and the presence of the keyword ?e.g.? in the comments of the subtypes). In Figure 1,?Concept?, our name for the supertype of all types, is selected, hence no supertype is displayed. In order to highlight the types which have many supertypes, their type names are displayed preceded by a ’%’. For more clarity, when these supertypes must be displayed (e.g. when any of their subtypes are manually selected or retrieved with a lexical entry) their type names are displayed preceded by a ’^’ and the supertypes of these supertypes are not displayed (see Figure 2). Lastly, for saving the user browsing time, the leaves (i.e. the types which only have ?Absurd? for subtype) are preceded by ’.’.

Figure 2:The concept types (with their supertypes) proposed by Wordnet for the lexical entry ?line?.

A type may be retrieved by browsing (i.e. by successive selections) but also with a lexical entry. If only the current lattice must be searched, the exact type name must be given. If the WordNet ontology is searched, any lexical entry (word or expression) known by WordNet may be used. WordNet will extract the word form of the lexical entry and gives back all the word meanings it knows for this word form. We will detail in the next section how CGKAT constructs concept type names with these word meanings and includes them in the lattice. When a type is selected, the user may apply some commands to it: adding of subtype or supertype, type removal, removal of subtypes (with warnings if some loaded GCs use them), etc.

The relation type hierarchy can be browsed and handled in the same way except that WordNet cannot be dynamically included into it. For the dynamic inclusion of WordNet in the concept type lattice, CGKAT only exploits the IsA relation between word meanings (synsets), assuming it is a Kind-Of relation. WordNet connects these synsets with other relations, e.g. Part-of, Cause-of and Attribute. Although we have not implemented it, we think that from the user point of view, these relations could be browsed and handled with the same interface as for the IsA relation. However, in the Conceptual Graph formalism (Sowa, 1984) (Sowa, 1993), apart from the Kind-of relation, relations between concept types are represented via type de?nitions, schemas, or metalevel graphs using concepts with second-order types and second-order relations like ?Kind? and ?Subtype?. Therefore, even if other relations between types are implemented like the Kind-Of relation, they must be given a clear semantic interpretation in the conceptual graph theory, and a mechanism should probably be implemented for trans-lating these relations into de?nitions, schemas or metalevel graphs when necessary.

3 Dynamic inclusion of WordNet in the concept type lattice of CGKAT

A synset represents a word meaning, is unique, and is connected to other synsets by an IsA relation1. Since these synsets very rarely represent individuals, the IsA relation may be assumed to be a Kind-Of relation and therefore be used to build a concept type lattice.

We found a way to automatically build a unique concept type name with the names in a synset. If there are at least two names in the synset, a simple concatenation of these names is suf?cient. If there is only one name, a unique type may be built by adding to this name the initial of the grammatical category of the synonyms (e.g. ’n’ for a noun, ’v’ for a verb) and the sense number of this synset in this grammatical category (if this sense number is 0, then the sense number is ommited because there is no corresponding synset for the same category2). (These information can be accessed via the WordNet functional interface). Here are two examples which can be found in Figure 1. With the synset {object, inanimate object, physical object} CGKAT builds the concept type name W_object__inanimate_object__physical_object (the word forms ?object?, ?inanimate object? and ?physical object? are in the same synset because in some contexts they are synonyms and refer to the same concept). With the synset

1. The wordNet authors call the IsA relation between word meanings, an ?Hyponym? relation, but agree that IsA is

a term that is commonly used for such a relation. The following definition is used to build the Hyponym hierarchy: ?a concept represented by the synonyms set {x, x’, ...} is said to be a hyponym of the concept represented by the synonyms set {y, y’, ...} if native speakers of English accept sentences constructed from such frames as An x is a (kind of) y?. Hence, we think that parts of the WorNet ontology may be included in various lattice without ontological loss.

2. In this last case, for the encoding of synsets including only one word form, we have tried to not add systematically the initial of its grammatical category to the name, but then the time taken to check if the concept type name was uni-que (check in the WordNet databases of the other grammatical categories), was too long to be tolerable.

{state} CGKAT builds the concept type name W_state/n. Thus, all concept type names which come from WordNet begin by ?W_? (Figure 1 and 2 give many examples). This sometimes makes very long concept type names, especially when the synonyms are verbs, but it makes them rather unambiguous. For its application, the knowledge engineer may specialize these concept types in order to use shorter names and to express semantic restrictions.

The WordNet concept types which are visible in Figure 1 correspond to top-level WordNet synsets. Figure 2 shows some of the various concept types (with their supertypes) which can be retrieved in WordNet with the lexical entry ?line? (the comments are also retrieved with the types). In order to highlight the WordNet types which have just been dynamically inserted in the lattice, their concept type names are displayed preceded by a ’~’, which means that they are temporarily inserted: The user may decide to keep in the lattice the types he really needs for his application, the other ones are removed when they are no more needed for display.

The WordNet supertypes for a retrieved WordNet concept type are dynamically inserted in the lattice until one of the supertypes is already known in the lattice. If none of the supertypes is known, the upper one is placed under ?Concept?. This mechanism enables the user to specialize, to restructure or to delete any part of the lattice (our top-level or any part of the organization proposed by WordNet for concept types) without loosing access to the WordNet ontology with lexical searchs: the WordNet supertype of a WordNet type is dynamically inserted in the lattice only if this type did not belong to the lattice before. Then only the parts of WordNet which are not ovewritten by the user are retrieved. And since WordNet has many inadequacies and since it cannot be wholly adequate for any application, it is a necessity to overwrite or to complement some parts of its organization.

Similarly, when the user clicks on a WordNet concept type (temporary or not), CGKAT retrieves its ?rst subtypes according to WordNet and inserts them in the lattice (temporarily if they are not yet known), except if the user has speci?ed that it should not do that with this type (this information is saved in a hidden part of the comment of the type). If the user has de?ned other subtypes for this type, they are shown before the WordNet subtypes. Hence, as for the display of supertypes, the display of subtypes takes into account the user restructuration or completion of the WordNet ontology. To sum up, the knowledge engineer can always be guided by the highly structured WordNet ontology, even when he corrects or completes it.

We have said that some of the WordNet synsets represent individuals. Therefore CGKAT can build concept type names for these individuals and insert them temporarily in the lattice, e.g. W_Johann_Sebastian_Bach will be proposed as a subtype for an organist and a composer. The user should not keep such proposal in the lattice.

Another problem is: does the inclusion of concept types proposed by WordNet may change a lattice into a structure which is not a lattice ? Although the WordNet ontology is mainly a tree, the answer may be positive. Hence, a veri?cation procedure should be run after each inclusion, or more realistically only when the user desires it since the duration of this check is proportional to the cube of the number of types in the lattice. If the WordNet ontology is not dynamically included but wholly included, each checking after some modi?cations in the lattice would take a very long time. Such checks and other helps to build the lattice will be introduced in CGKAT when it will be connected to the ?cooperative program for the construction of a concept type lattice? of Chein & Leclère (1993).

4 Advantages of using the WordNet ontology for knowledge reuse

A K

B that is built using concept types coming from WordNet, or specializations of these types, could be rather easily compared with another KB built in the same way since a lot of concept type names used in the two KB would be common and their meanings too1. If those concept types are organized a bit differently in the two KB, automatic procedures could detect the differences and help to resolve them2. As WordNet is very detailed, the knowledge engineer should rarely have to add intermediate types but rather specialize precise types of WordNet in order to express the shades of meanings needed for his application.

Therefore, in order to ease the use and reuse of the KB knowledge, we suggest to the knowledge engineer to specialize WordNet concept types. The knowlege engineer may also de?ne his application types as subtypes of a type like Concept_used_in_an_application (see Figure 1). Hence, he may build and work on the minimal hierarchy necessary for its appli-cation, without being bothered by the high structuring offered by WordNet and our high level concept types, but without loosing them.This structuring may be not useful for the ?nal KBS but it is necessary for a good modelling, for powerful searchs and inferences, and for easing validation, extension and reuse. Filters could always be applied when only a part of the ontology is needed.

5 The top-level structuration proposed by CGKAT for concept types

We have seen the WordNet ontology and our top-level ontology are just proposals: any part can be modi?ed by the user. Let us present this top-level ontology and explain its rationales.

5.1 Structuring with the notions of situation, process, state, event, proposition and dimension

The ?rst idea was to merge the top-level concept types of WordNet into the situational ontological model proposed by Sowa (1992) because it makes ontological distinctions which are not in WordNet but which are important for KR, especially in the Conceptual Graph context. Then, all the WordNet concept types are automatically classed according to these ontological distinctions. These distinctions are the notions of Situation, Time, Space, State, Process and Proposition. Here is an extract of Sowa (1992) which introduces them.

?Situation semantics (Barwise & Perry, 1983) has been widely adopted as one of the most ?exible ways of de?ning the semantics of language. Each situation is a ?nite con?guration of some aspect of the world in a limited region of space and time. It may be a static con?guration that remains unchanged for a period of time, or it may include processes and events that are causing changes. ...

In conceptual graphs, a situation is represented by a context, which is a concept that contains one or more propositions that describe the situation.. The propositions in a context could be expressed by a paragraph of English sentences or by a collection of conceptual graphs.

The notion of Situation enables to group concepts to which temporal relations like ?Point_in_Time? or ?Duration? may be attached. Therefore, these relations may be signed on concepts of type Situation and concepts of type Time (the interests of signed relations for KR 1. The concept types coming from WordNet are precise and have detailled names and comments; therefore we think that they do not induce distant interpretations. This is also an important point for interpretation and validation of the KB. Apart from these advantages, if any other large general ontology would exist, it could also be used in the same way by a knowledge engineer for building a detailed and reusable ontology for its application.

2. Of course, a lot of problems will have to be solved during the merging of two lattices, for example, what to be done if different definitions or schemas are associated to the same concept types ? But this merging, or more generally the reuse of other works, is eased if the works rely on a same basis, e.g. WordNet.

and KA will be developed in section 6). The notion of Situation induces its complement, that is the notion of Entity which groups concepts to which attaching relations like Point_in_time or Duration is a non-sense. Figure 1 shows the de?nitions for Entity, Situation, State, Process1 and Proposition, and the WordNet top-level subtypes we found for them2. We had to group several WordNet top-level concept types for the notions of time and space under the two concept types Time and Space (these two concept types are needed to give signatures to temporal and spatial relation types). The following subtypes of Time and Space, are also needed to sign more accurately some temporal and spatial relations: Point_in_Time, Time_Period, Point_in_Space and Space_Region. But the notions relative to these types are scattered in the WordNet ontology, then we could not group them and we have to let the knowledge engineer specialize these types with the temporal or spatial WordNet types he uses.

Similarly, in WordNet the notions of dimension, dimension unit, measure, property (in the sense of a mesurable characteristic of an entity or a situation) and attribute (in the sense of a measure of a property) are completely mixed. Then, we grouped all the WordNet top-level concept types relative to these notions under the concept type Dimension_or_Measure, and we also offered speci?c concept types for each one of them. The user had to specialize these speci?c concept types with the WordNet types he uses, if he wants to use relations which are signed on these speci?c concept types (for example, the signature of the relation CHRC is Concept->Property, whereas the signature of the relation ATTR is Concept->Attribute). The concept types Time and Space are also subtypes of Dimension_or_Measure. We have subtyped Space by Physical_Entity in order to let spatial relations use a physical entity as a spatial region (this simplify a lot the building and display of CGs). But then, if we had de?ne Dimension_or_Measure as a subtype of Entity, Physical_Entity would not have been a direct subtype of Entity, and it would not have been visible in Figure 1. For ergomic reasons we wanted to avoid that, thereby we have de?ned Dimension_or_Measure as a direct subtype of Concept.

Like Sowa (1992), we have subtyped Entity by Proposition and Representation_Entity (under this type, we have synthesized or structured the main abstract data types used in computer sciences). We have also added the concept type Collection and specialized it with 1) the top-level WordNet concept types about groups or sets, 2) the set type hierarchy proposed by Pfeiffer and Hartley (1992), 3) the abstract data types which collections (these types are also subtypes of Representation_Entity). Besides, Linear_ADT (a type which groups linear abstract data types like Character or Number) is a subtype of Linear_Dimension_or_ADT which is also a supertype for Time and Space; therefore, mathematical relations signed on Linear_Dimension_or_ADT (e.g. Equal and LessThanOrEqual) may be used between numbers as well as between concepts of time and space.

1. Any process (that is a potential causator of changes) may be viewed as a state (that is, according to Sowa, a situation that remains unchanged during a given period of time) if the period of time is sufficiently short. For example, ?a rock is rolling? may be represented by: [State]->(Descr)->[Proposition: [Rock]<-(Object)<-[Roll] ].

2. In order to understand why we classed such WordNet types as subtypes of State and Process, think that they occur in some region of Time and Space, see in Section 6 the list of the relations that can olny be attached to processes, and see in Figure 2 some examples of subtypes of W_relation/n (the types related to the first meaning for ?line? shown in Figure 2) and of W_psychological_feature (the types related to the second meaning for ?line? shown in Figure 2).

W_communication/n, subtype of W_relation/n in WordNet, is now a subtype of Proposition. W_communication/n is a supertype of W_proposition (and hence of W_theorem/n, W_conclusion/n1, etc.), W_message/n, W_hypothesis/n, etc.

Also like Sowa (1992), we have subtyped Process by Event (more exactly W_event1). Sowa doesn’t de?ne what an event is, but from his explanations, it may be infered that a process is considered to be an event for a given period of time when it makes a change during this period. If it doesn’t make a change during this period, it may be considered as state. Hence an event cannot be a state. But in his type hierarchy, Sowa (1992) de?nes Action as a subtype of Event, thereby forbidding that an action may be viewed as a state. In order to avoid this, we have de?ned W_act__human_action__human_activity as a direct subtype of Process. Then, according to the application expertise, the knowledge engineer may use a process as an event or as a state, and if he needs to, he may class some types of process as subtypes of W-event or subtypes of State. Since CGKAT is aimed to be a KA tool, we have also subtyped Process by some types of problem solving tasks we have collected in the KA literature concerning KADS, e.g. (Wielinga & al., 1992).

These ontological distinctions may appears obvious but we have often noted that even when these distinctions are clearly stated and used, knowledge engineers make semantic errors when they represent knowledge. For example, they try to connect ?relations for processes? to concepts of type State (e.g. the relation Agent instead of using the relation Consequence or Succ), they de?ne common subtypes to exclusive concept types (e.g. Observation as a subtype of Process and Proposition (hence they confuse the action for its results)), and use types instead of others (e.g. Redness which is a subtype of Color, instead of Red which is a subtype of IsColored). In CGKAT, these problems are much reduced because: 1) all relations types have a signature which is checked when a relation is added; 2) we have implemented exclusion relations between types, and we have de?ned some in our top-level ontology (the most important or helpful is the exclusion between entities (e.g. propositions) and situations (e.g. processes)).

5.2 Structuring with the notion of role

Finally, we introduced in our top-level ontology, a distinction which is orthogonal to the previous ones, that is the notion of ?role? (see in Figure 1 the type Concept_Playing_a_Role and its subtypes). A role type expresses the roles that an individual can play. For example, an entity may be the cause, the agent or the result of a process (examples of agent roles are ?taxi driver? or ?musician?). We give below an extract of our typology for the roles of entities (some of its types, e.g. Causal_Entity, are necessary for giving a signature to some relation types, e.g. Agent). WordNet does not make distinctions between ?natural types? and ?role types?, thereby we have to let the user make the distinctions when he needs it (at least our top-level ontology offers the framework for that). Some direct subtypes of Concept_Playing_a_Role are Property2, Attribute, Concept_Used_by_a_Process, 1. The subtypes of this WordNet top-level concept type correspond to the notion of what is an event for us. We have just changed the comment given by WordNet to this type.

2. Being a property or an attribute of another concept is clearly a role type. In Sowa(1992), property types are second order types and attribute types are instances of property types. We could not follow Sowa in that direction since: 1) properties and attributes (in the sense defined above) are not distinguished in WordNet; 2 ) properties are (first order) role types; 3) second-order concepts cannot be connected to the same relations as the first order concept since a relation type has only one signature. All other ontological model we have seen use the Kind-Of relation between property types and attribute type, e.g. the Situation Data Model of Tepfenhart (1992). Our choice implies that an attribute type (e.g. Red) cannot be in the referent part of a concept with a property type (e.g. Color) except if this may be interpreted as a shortcut for a relation ?Value? between this concept and a generic concept with this attribute type (e.g. [Color: Red] would be expansed in [Color]->(Value)->[Red]).

Concept_Used_in_an_Application and Concept_Known_by_Someone. This last type is useful for example in KA from multiple experts or with many knowledge engineers. The type Concept_Used_in_an_Application is especially useful when the concept types from WordNet are used, since it enables the knowledge engineer to focus on the types of its application (even if each of these types are also subtypes of WordNet concept types).

Entity_Playing_a_Role

Causal_Entity -- any entity that can cause a process

Goal-directed_Entity -- Problem Solver or interactional agent Entity

Conscious_Goal-directed_Entity -- e.g. a person

Non_Conscious_Goal-directed_Entity -- e.g. an AI agent

Perhaps_Goal-directed_Entity -- e.g. supernatural forces

Without_Goal_Entity -- non conscious Entity and not an AI_Agent

Input_Entity -- input of a process

Output_Entity -- output of a process

Recipient_Entity -- recipient of a process

Patient_Entity -- the object of a process, e.g. W_subject__content__depicted object

W_necessity__essential__requirement__requisite__need -- anything needed

W_inessential -- anything that is not essential

Possessed_Entity -- e.g. a Pet, W_possesion/n

Part_Entity

W_part__portion -- something determined in relation to something that includes it

W_part__piece -- a portion of a natural object

W_unit__building_block -- an undivided entity occurring in the composition of something

Whole_Entity

W_whole -- all of something including all its component elements or parts

W_whole__whole_thing__unit -- a single undivided entity

W_unit/n -- an organization regarded as part of a larger social group

Representation_Container -- e.g. a text or audio ?le

Model_ADT -- a representation entity which is a model, e.g. KADS_Model

We said in the introduction that WordNet organizes the concepts which nouns refer to in ten exclusive concept categories. The concept types for these categories are: W_entity/n, W_abstraction/n, W_group__grouping, W_state/n, W_psychological_feature/n, W_event/n, W_phenomenon/n, W_act__human_action__human_activity, W_location/n, W_possession/n. For building our top-level ontology, we had to dispatch the 13 subtypes of W_entity (under Entity and Entity_Playing_a_Role) and the 7 ?rst subtypes of W_abstraction (under Dimension_or_Measure and State). Hence, we have merged/subordinated 28 top-level WordNet concept types into our top-level ontology. We have also subtyped Entity_Playing_a_Role with some deeper WordNet concept types in order to give pointers to some role types hidden in WordNet. Presently, our top-level ontology includes about 200 concept types.

WordNet has a very long list of top-level synsets where the synonyms are verbs. Therefore, we have not merged/subordinated the concept types corresponding to these synsets into our top-level ontology. The CGKAT user will have to place these types himself if he wants to use them instead or in complement of the types which come from the synsets composed of nouns for actions. Similarly, we have not classed concept types relative to notions expressed by adjectives or adverbs since WordNet does not offer typologies for them, but the user may include them in the concept type lattice (by default, they will be placed under Concept).

6 The relation type hierarchy

We call a relation a basic relation when it cannot be de?ned using one or more concepts and other basic relations which are different from the primitive relation LINK. For example, case relations (also called thematic relations) like Agent, Object and Recipient, are basic relations. We call a relation which is not basic, a complex relation. We have shown in (Martin 1995) that relations should remain basic for many reasons among which: 1) the concept type hierarchy is duplicated in the relation type hierarchy when complex relation types are intro-duced in it; 2) if a complex relation has no de?nition (with concepts), a conceptual graph (CG) using this relation cannot be expansed and then a lot of graph-matching possibilities are lost;

3) a complex relation which has a de?nition enables to hide some more basic relations but cannot be used when other basic relations must be added; 4) hidding more basic relations leads to ambiguities, and induces problems for graph-matching. Afterwards, in (Martin, 1995), we have presented a top-level ontology for 200 basic relation types gathered from the hypertext and knowledge representation literature.

WordNet doesn’t provide basic relation types, e.g. Part, Purpose and Method, it includes concept types which express roles, e.g. W_part__portion, W_intention__purpose, W_wise__method. And these roles are not grouped (we have done it partly), hence WordNet is of no use for building a relation type hierarchy.

A hierarchy of basic relation types is a great help in KA and KR: 1) it provides cues for representing knowledge or searching it in a CG base; 2) the signatures of the relation types enforce a minimal coherence in knowledge representation (during the buiding of CGs but also during the building of the concept type lattice: our relation type hierarchy was a great guide for enhancing the coherence and completeness of our top-level concept types ontology !); 3) these signatures may also be used for elicitation or knowledge collect: from a concept in a CG, all the types of the relations which may be connected to this concept, may be listed (CGKAT presents this list in hierarchical order, as for any part of the relation type hierarchy; however, since our relation types are signed on top-level concept types, the choice is often large).

7 Conclusion

In order to help the knowledge engineer to build its concept type lattice and represent knowledge with CGs, CGKAT offers him a top-level concept type ontology which includes many conceptual distinctions necessary for KR and KA and under which the WordNet ontology may be dynamically included. This top-level ontology is not obligatory for accessing the WordNet concept types but it gives them a structuration which is useful for KR. We especially think about concept types on which basic relation types are signed, and to the struc-turation of role types. However, in many cases we could not gather and group all the WordNet concept types relative to a notion, and include them into our top-level ontology (it would not have included about 200 concept types but much more than 10.000 (we have tried !)). Therefore, in those cases the user has to subtype himself the ontological distinctions he ?nd interesting, with the types he uses in his application. CGKAT also offers a top-level ontology for relations for guiding and saving the knowledge engineer work in KR and KA.

In these top-level ontologies, we included the ontological distinctions made by WordNet, Sowa (1992), Sowa (1984), Pfeiffer & Hartley (1992) and also most of the distinctions made by Tepfenhart (1992), Esch (1992), the PENMAN Upper Model (Bateman 1990) and CYC

(Lenat & Guha, 1990). An interesting work would be to pursue this work with other top-level ontologies, e.g. ONTOS (Carlson & Nirenburg, 1990). Another one would be to exploit the other relations provided by WordNet, e.g. ?Part-of? and ?Cause-of?.

8 Acknowledgements

The author thanks its referees and Dr Jurgen Mueller for their comments on previous versions of this article.

9 References

Bateman J. (1990).Upper modeling: Organizing knowledge for natural language processing. In Proc.

of Fifth International Workshop on Natural Language Generation, Pittsburgh, PA.

Barwise J. & Perry J. (1983).Situation and Attitudes. MIT Press, Cambridge, MA.

Carlson L, & Nirenburg S. (1990). World modeling for NLP. Technical Report CMU-CMT-90-121, Center for Machine Translation, Carnegie Mellon University.

Chein M. & Leclère M. (1993).A cooperative program for the construction of a concept type lattice.

Research report No 93075 of LIRMM(Laboratoire d’Informatique, de Robotique et de Microélec-tronique de Montpellier - France - fax : (33) 67 41 85 00), 1993.

Esch J.W. (1992).Temporal Intervals. In Conceptual Structures: current research and practice (Eds: Nagle T.E., Nagle J.A., Gerholz L.L. & Eklund P.W.), England , Ellis Horwood Workshops, 1992. Knight K. & Luk S. (1994).Building a Large-Scale Knowledge Base for Machine Translation. In Proc.

of AAAI’94, twelfth national conference on artificial intelligence, July 1994.

Haemmerlé O. (1995).CoGITo: une plate-forme de développement de logiciels sur les graphes concep-tuels. Ph.D thesis, Université Montpellier II, France, January 1995.

Lenat D.B. & Lenat R.V.G. (1990).Building large knowledge-based systems: representation and inference in the Cyc project. Reading, MA; Sydney; Tokyo:Addison-Wesley, 1990.

Martin Ph. (1995).Knowledge Acquisition Using Documents, Conceptual Graphs and a Semantically Structured Dictionary. Proc. of KAW’95, ninth Knowledge Acquisition for Knowledge-Based Systems Workshop, Gaines, B.R. Eds, University of Calgary, Banff, Canada, 1995.

Miller G.A., Beckwith R., Fellbaum C., Gross D. & Miller K. (1990).Five Papers on WordNet.CSL Report 43, Cognitive Science Laboratory, Princetown University, July 1990. (These papers and the system are available by anonymous ftp at https://www.wendangku.net/doc/af16012532.html,, subdirectory ’pub’).

Pfeiffer H.D. & Hartley R.T. (1992).The Conceptual Programming Environment, CP. In Conceptual Structures: current research and practice (Eds: Nagle T.E., Nagle J.A., Gerholz L.L. & Eklund P.W.), England , Ellis Horwood Workshops, 1992.

Sowa J.F. (1984).Conceptual Structures: Information Processing in Mind and Machine. Addison-Wesley, Reading, MA.

Sowa J.F. (1992).Conceptual Graphs Summary. In Conceptual Structures: current research and practice (editors: Nagle, T.E., Nagle, J.A., Gerholz, L.L., and Eklund, P.W.), England , Ellis Horwood Workshops, 1992.

Sowa J.F. (1993).Relating Diagrams to Logic. In Proc of ICCS’93, first international conference on conceptual structures (Eds: G.W. Mineau, B. Moulin, J.F. Sowa), Quebec City, Canada, August 1993. Tepfenhart W.M. (1992).Using the Situation Data Model to Construct a Conceptual Basis Set.In Conceptual Structures: current research and practice (Eds: Nagle T.E., Nagle J.A., Gerholz L.L. & Eklund P.W.), England , Ellis Horwood Workshops, 1992.

Wielinga B., Schreiber G. & Breuker J. (1992).KADS: a modelling approach to knowledge enginee-ring. In Knowledge Acquisition (1992) 4, pp 136-145.

WordNet发展概况

一、WordNet发展概况 ·关于WordNet的不成熟的想法可以追溯到20多年前,而这一想法开始逐渐具体化和清晰化则是1985年后才开始的。从85年开始,WordNet作为一个知识工程全面展开。不过,当时的WordNet和经过10多年后今天的WordNet还是很不一样的。 ·这一工程最初的前提之一是“可分离性假设”(Separability hypothesis),即语言的词汇成分可以被离析出来并专门针对它加以研究。词汇编纂学的历史明确地告诉我们,在词语水平上可以得到有用的研究成果。词库(词典,lexicon)当然不是完全独立于其他语言成分的,但它的确是可以从其他成分中分离出来的。例如,尽管语音和语法知识在一个人的早年生活中就成型了,但词汇量却可以随着智力活动的不断积累而增加。这表明语言的不同成分涉及不同的认知过程。 ·另一个前提是“模式假设”(patterning hypothesis):一个人不可能掌握他运用一种语言所需的所有词汇,除非他能够利用词义中存在的系统的模式和词义之间的关系。这种系统化的心智模式至少从柏拉图时代就成为一种进行推测的学问,现代语言学研究开始在自然语言的语义结构中识别这样的模式。但许多遵循这类路线的出色的研究工作在这一问题上碰到了困难。一个作者可能提出一种语义理论,并以20到50个英语单词为例来展示他的理论,而留下另外10万个单词让读者去做练习。 ·第三个前提就是所谓的“广泛性假设”(comprehensiveness hypothesis):计算语言学如果希望能像人那样处理自然语言,就需要像人那样储存尽可能多的词汇知识。 ·建立包含词语意义描述的大规模词库的方式之一是基于语义成分分析的词汇语义学(componential lexical semantics)的方法(也可译为义素分析法)。这种方式把一个词的意义分析为更小的概念原子的组合。不过,定义一套概念原子却非易事。事实上,WordNet主帅https://www.wendangku.net/doc/af16012532.html,ler在1976年他与Philip N. Johnson-Laird合作的《Language and Perception》一书中还踌躇满志地探索义素分析的语义描写方法,但直到1985年,仍然没有能够出笼一个完整的定义清晰的清单,在上面列举出所有的概念原子。 ·到1985年,许多认知心理学家和计算语言学家开始以“网”的形式来描述词语的意义。比如:“桌子”(table)和“家具”(furniture)代表两个节点(node),而这两个节点之间有一个箭头(dart)来表示这样的命题:桌子是一种家具(a table is a kind of furniture),即“Is-A-KIND-OF”这样的语义关系。随着这方面研究的增多,越来越多的人自觉地意识到:除了利用语义成分(义素分析法)表示语义,还可以利用关系来表示语义(基于关系的词汇语义学relational lexical semantics),而且后者有可能替代前者。 ·在WordNet的早期阶段,研究人员主要是在考虑用关系语义来描述词义的方式是否能够大规模地广泛使用,而不是仅仅停留在玩具式的演示水平上。到了研究人员确信这是可行的的时候,他们就编制了应用软件来把想法变成现实。实际上,在早期,Miller并没有关于构建一个大词库的完整想法。初步设想是识别由字符串组成的最重要的词节点,并探索其中的语义关系模式。当时的想法是,如果得到了正确的语义关系模式,词语的定义就能从中推理出来,因此,对于一个有关词义的关系网来说,词义的定义是多余的。 ·在1978年的时候,Miller描述了一种“自动化词典”(automated dictionary)的想法。不过那时候他完全不知道该如何实现这种想法。由于Sloan基金会,Spencer基金会,IBM 公司沃盛研究中心(Watson Research Center)的支持,Miller得以一直保持着他的想法,而没有中途放弃。到1984年的时候,Miller甚至在IBM PC机上做出了45个名词的小型语义网,他把这个小网叫做“word net”。Miller在IBM和Bellcore演示了这个示例成果。他在

冠词a,anthe以及零冠词的用法及练习详解

冠词分为不定冠词(a, an),定冠词(the),和零冠词。一.不定冠词(a, an) 1.指一类人或事,相当于a kind of A plane is a machine that can fly. 2.第一次提及某人某物,非特指 A boy is waiting for you.有个男孩在等你。 3.表示“每一”相当于every,one We study eight hours a day. 4.表示“相同”相当于the same We are nearly of an age. 5.用于人名前,表示不认识此人或与某名人有类似性质的人或事 That boy is rather a Lei Feng.(活雷锋) 6.用于固定词组中 a couple of, a bit, once upon a time, in a hurry, have a walk, many a time 7.用于quite, rather, many, half, what, such之后 This room is rather a big one. 8.用于so(as, too, how)+形容词之后 She is as clever a girl as you can wish to meet. 9.用于抽象名词具体化的名词前 success(抽象名词)→a success(具体化) 成功的人或事 a failure 失败的人或事 a shame 带来耻辱的人或事 a pity 可惜或遗憾的事 a must 必需必备的东西 a good knowledge of 精通掌握某一方面的知识 10.与序数词连用,表示“又一,再一”。 In order to find a better job, he decided to study a second foreign language. 为了找到一个更好的工作,他决定再学习另外一门外语。 二.定冠词(the) 1.表示某一类人或物 In many places in China, ___ bicycle is still ___ popular means of transportation. A. a; the B. /; a C. the; a D. the; the 2.用于世上独一无二的事物名词前 the universe, the moon, the Pacific Ocean 3.表示说话双方都了解的或上文提到过的人或事 Would you mind opening the door? 4.用于演奏乐器 play the violin, play the guitar 5.用于形容词和分词前表示一类人 the reach, the living, the wounded 6.表示“一家人”或“夫妇”(对比上文的不定冠词用法5) —Could you tell me the way to ____ Johnsons, please? —Sorry, we don’t have ____ Johnson here in the village.

亚马逊kindle 3使用技巧(使用前必读)

亚马逊kindle 3 3G版使用技巧 (霓裳爱读原创) 一、使用中的一些小技巧(使用前最好看一下): 1、为了让机器在多看系统下看书时间更长,从原系统切换到多看系统的时候最好把3G信号关闭,方法:按home键——menu键——选择turn wareless off 就可以了。选择这个选项就把3G和wifi信号都关闭了。 2、如果想在多看系统下看书的时候翻页不黑屏,按home键——menu 键——选择“系统设置”——翻页方式里面选择“瞬翻”,这样翻页就不会黑屏了。系统默认是翻10页黑一次,可以自由设定其他翻页次数。 3、如何从原系统切换到多看系统 按home键——menu键——选择settings——然后再按menu键——选择restart 就可以了。 4、如何从多看系统切换到原系统 按home键——menu键——选择“系统设置”——第一页最上方有个“切换到kindle系统”,选择这一项就可以了。 5、保护屏幕注意事项:不要把钥匙和其他尖锐物体和机器一起放在包里,套上皮套也不行。在床上看书一定要注意看完之后把机器放到盒子或者桌子上,否则一翻身很容易压到屏幕。油墨屏很薄,并且背面蒙了一层玻璃,不能受太大的压强,千万要注意!!!

6、千万不要用机器下载亚马逊官网的免费书,因为注册的是美国地址,但是到亚马逊官网下载图书的时候服务器显示的IP地址是中国的,他们会发确认信,所以千万不要这样做。

一、原系统使用技巧 1、原系统支持图书格式: Pdf、txt(utf-8格式)、mobi、prc 2、原系统上网教程: 按home键-然后按menu键-选择search 就会出现网页输入框。输入网址之后,选择go to web 就行了。 3、原系统上网技巧: 在机器网页输入框输入https://www.wendangku.net/doc/af16012532.html,网站,然后注册一下,登录之后在浏览设置里面设置好适合自己的网页大小。这样就可以解决亚马逊只能单网页浏览和网页字体小等问题。 土豆详细视频教程: https://www.wendangku.net/doc/af16012532.html,/programs/view/euUYpSYMff0/?resourceId=5 8836300_06_11_99&rpid=58836300 4、kindle 3资源盘附赠20部字典目录(支持划词翻译) 1.Collaborative International Dictionary 2.Webster's Revised unabridged Dictionary 3. NCCE-EC 新世纪英汉科技大词典 https://www.wendangku.net/doc/af16012532.html,ngDao-EC-GB 朗道英汉字典 5.21shiji 21世纪英汉汉英双向词典 6.Merriam Wester Colegiate 7.OALD4-CN 牛津高阶英汉双解词典 8.法汉字典

高中英语语法——冠词用法归纳

冠词 不定冠词的用法及语法说明 1. 用a 还是用an: 一般说来,在辅音或半元音开头的词前用a, 而在元音开头的词前用an。 注意: 有些以元音字母开头的单词,由于第一个音不是元音而是辅音,其前仍用a 而不用an: a one-eyed man 一个独眼人 a European country 一个欧洲国家 2. 单数可数名词若泛指,其前需加a ,an, 不要从汉语习惯出发,漏掉此不定冠词: He is a famous film star. 他是著名影星。 3. 专有名词转化为普通名词,其前可用a (an),表示某某人或某某人的一部作品、艺术品等: a Mr Smith 一位名叫史密斯先生的人 4. 物质名词转化为普通名词,其前可以使用a (an),有时表示相应产品或种类,有时表示数量关系: a good wine 一种好酒 5. 在序数词之前使用a (an),可以表示数量或序数的增加: Soon I saw a second plane. 不久我又看到了一架飞机。 6. 与形容词的最高级连用,表示“非常”、“很”等: This is a almost interesting story. 这是一个非常有趣的故事。 7. 用于修饰名词的定语前,表示某种状态。此时的不定冠词含有类似a kind of 的意思: climate 气候→a mild climate 温和的气候 have breakfast 吃早餐→have a quick breakfast吃快餐 8. 不定冠词a (an) 与数词one 都可表示“一”,但是两者有差别: 不定冠词 a (an) 表示“类别”概念,而数词one 表示“数量”概念

基于Wiki的本体构建方法

第30卷第8期通化师范学院学报Vol.30№8 2009年8月JOURNAL OF T ONGHUA TEACHERS COLLEGE Aug.2009 基于W iki的本体构建方法 于江涛,毛慧珍 (通化师范学院计算机科学系,吉林通化134002) 摘 要:该文提出一种本体构造环境方案,在W iki pedia的基础上加入本体构造用户接口,降低用户构造本体的门槛,使用户在建立概念的同时创建本体.系统以OWL本体形式存储、管理和共享知识,还可以以系统已有概念为字典,对相关本体领域相关文本进行本体学习,自动建立本体. 关键词:本体构建;W iki;用户驱动;本体学习 中图分类号:TP311 文献标志码:A 文章编号:1008-7974(2009)08-0019-02 收稿日期:2009-06-01 作者简介:于江涛(1969-),男,硕士,通化师范学院计算机科学系副教授. 1 引言 本体(Ont ol ogy)是当前人工智能研究领域的热点,是解决知识工程中一些问题的有效方法.它的优势体现在可以用于不同领域内的人之间的交流和知识共享,可用于语义网进行语义判断,还可对知识进行管理.本体的构建是本体应用的前提,一直是个烦琐的过程.传统上为了保证本体的正确性,领域本体的构建都需要领域专家的参与.然而仅靠少数领域专家的参与难以实现领域本体构建的繁重任务[1],更不用说实现本体工程. 仅有少部分人来构建本体,主要存在两个问题:①本体的创建过程不在其用户的完全控制之内,一旦被发现有错误,发现者往往不能自已修改,而要求助于少部分人的本体建造者;②本体使用者不能抓住本体的重要性质,本体不能更好的满足用户的需要.因此,在允许少量误差前提下,我们需要更快捷的方法得到大范围的领域本体.这便需要降低本体产生和维护工具的使用门槛,使更多人的参与进来. 本文提出了基于W iki技术的本体构建方法,用户可以通过模仿自然语言中词汇的出现过程来完成本体的建立,就像任何人都能发明一个自然语言中的词汇,任何人都可以依靠W iki技术建立自己的本体.经过一次次的修改最终成为最完善和满足用户需要的本体.该方法以OWL本体来存储概念,在W iki pedia的基础上加入本体构件的用户接口,用户在建立概念的同时就建立了本体. 2 基于W iki的本体构造方法 设计界面类似于Platypus W iki(Platypus W iki 是一个Sem antic W iki W iki W eb工程[2]),但提供更丰富的OWL Full抽象语法,需要用自然语言的名称,以期不需要高的应用门槛.当使用W iki 建立一个新的概念(C lass)时,会提示记录父类(subC lass O f),当然也可以新建父类.如果其父类已经存在就取其父类的属性(Property)来指导该类属性的建立.继而对属性建立dom ain,range等等.同时对概念给出解释性自然语言描述,最终产生OWL交换语法描述和解析树.OWL本体可供修改和共享. 虽然任何人都可以对概念或者本体进行修改甚至删除,但W iki引入版本控制概念,所以任何版本的信息都会被保存下来.引入用户投票机制,让相关概念的使用者以自己的评价权重对已有本体进行评价,得到评价最高的本体作为相关概念的系统推荐本体.本体的评价高低又反过来决定其作者的评价权重. 当系统的本体规模足够大时,可以依托这些本体作为基本概念的字典,对欲建立的某新概念,指定相关领域网站,利用网络爬虫抽取与之链接网站中的文本,从相关文本中抽取对概念的描述语句,不断进行本体学习,自动建立相关概念的本体.该本体的准确性虽然略低,但可以作为用户建立相关本体时的参考,有指导作用,至少可以减少欲建立该本体的用户的工作量.当前在本体自动构建方面做的比较好的是Ont o W are Pr oject的text2ont o,它以WordNet 为字典,利用text m ining从大量文本资源中得到相关概念的描述信息,自动建立出该领域的本体[3,4]. ? 9 1 ?

一个在线义类词库:词网WordNet

https://www.wendangku.net/doc/af16012532.html,/paper_110583811_1/ 论文标题:中文信息处理专题研究:语义研究 一个在线义类词库:词网WordNet 论文作者陈群秀 (论文关键词,论文来源语言文字应用,论文单位京,点击次数184,论文页数69~104页1998年1998月论文网https://www.wendangku.net/doc/af16012532.html,/paper_110583811/ 计算机的自然语言理解和处理,依赖于计算语言学的研究成果。) 与计算词汇学和计算句法学相比,计算语义学是计算语言学领域里一门比较年轻的学科。相对而言,句法分析的理论和技术发展得比较成熟、完善,而语义分析的理论和技术起步比较晚,尚处于探索阶段,空白点较多而且难度最大。目前,自然语言理解正处于一个关键时期,处在取得重大突破的前夜,而语义研究领域的进展和突破对全局的进展和突破有至关重要的作用。语义包括词汇义、句义、篇章义等,其中最根本最重要的是词汇义的研究。词汇义的研究和表示的方法有多种,很重要的一种是语义分类。人读的义类词典几乎各国都有,机读的(即信息处理用的)义类词典在日本、美国等先进国家也都有研究或成果。在国内外同类课题中,最著名的是普林斯顿大学Miller等人研制的英语词网数据库WordNet。该词网旨在从心理语言学角度建立英语词汇基本语义关系的实际模型。本文简要介绍这个在线的义类词库。 一词网WordNet的概况 WordNet是一个在线词汇参照系统(在网上可机读的英语词库),是一个基于心理语言学原则的机器词典。WordNet用大家熟悉的拼法来表示词形,用同义词集Synsets(在一定上下文中可以互换的同义词形的列表)来表示词义。有两种关系:词汇的和语义的。词汇关系存在于词形间,语义关系存在于词义间。通常的人读词典或机读词典是按字母顺序组织词汇信息,将拼写相似的词放在一起,而让意思上相近的或相关的词随意地散置。WordNet 则想为广大读者依概念而不是依字母顺序查找词典获取词汇语义知识提供帮助。WordNet 目前包含大约95600个词条(51500个简单词和44100个复合词,它们被组织成约70100个词义或同义词集),描写了上下位、同义、反义、部分—整体等词汇语义关系。有一些国家将WordNet进行了本地化。 WordNet和一个标准的词典之间的最明显的区别就在于WordNet将所有英语词汇分成五类:名词、动词、形容词、副词和功能词。实际上,WordNet只包含名词、动词、形容词和副词。相对来说较小的英语功能词集被省略掉了,这是基于它们可能被作为语言的语法成分的一部分单独存放的假设。名词在词汇记忆中被组织成主题的层次,动词被组织成各种推演(蕴涵)关系,而形容词和副词被组织在N维超空间中。

高中英语语法:冠词之零冠词的用法

五、零冠词的用法 1. 用于物质名词前。物质名词表示泛指或一般概念时,通常用零冠词: Water boils at 100℃. 水在摄氏100度沸腾。 Blood is thicker than water. 水浓于水(即亲人总比外人亲)。 表示泛指或一般概念的物质名词前,即使有一描绘性修饰语,仍用零冠词: Don't eat rotten food. 不要吃腐烂的食物。 注:(1)若特指,物质名词前可用定冠词: Is the water in the well fit to drink? 这井里的水能喝吗? (2)表示一种、一杯、一场、一阵、一份等这样的概念时,可用不定冠词: This is a very good wine. 这是一种很好的酒。 A coffee, please. 请给我来杯咖啡。 It was very cold and a heavy snow was falling. 当时天气很冷,正在下大雪。 2. 用于抽象名词前。抽象名词表示泛指或一般概念时,通常用零冠词: Do you like music? 你喜欢音乐吗? Failure is the mother of success. 失败是成功之母。 表示泛指或一般概念的抽象名词前,即使有一描绘性修饰语,仍用零冠词: I like light music very much. 我非常喜欢轻音乐。 注:(1)若特指,抽象名词前可用定冠词: I like the music of Mozart. 我喜欢莫扎特的曲子。 (2)若表示一种、一类、一方面、那种、这种等这之类的概念时,可用不定冠词:He lives a happy life. 他过着幸福的生活。 Physics is a science. 物理是一门科学。 (3)表示动作的一次、一例、一番等时,可用不定冠词: Let me have a look. 让我看一看。 (4)表示与抽象名词意义相关的具体的人或事,可用不定冠词: The book is a delight to read. 这书读来很有趣。 3. 用于专有名词前。在通常情况下,专有名词前用零冠词: Smith lives in London. 史密斯住在伦敦。 注:(1)若特指,专有名词前有时也可用定冠词: The Smith you’re looking for no longer lives here.你找的那个史密斯不住这儿了。 (2)专有名词前使用不定冠词和定冠词的其他情况,见本章有关内容。 4. 用于复数名词前。复数名词表示类别时,通常用零冠词: Teachers should be respected. 教师应该受到尊重。 泛指不定量的人或物,也用零冠词: We are students of Class Five. 我们是五班的学生。 注:若特指,复数名词前应用定冠词: The teachers should attend the meeting 教师应参加会议。 5. 用于单数可数名词前。单数可数名词前用零冠词,主要有以下情况:

wordnet关系词

English Chinese list of wordnet-related terms 3.3.1 A 各类词网| B 词义关系| C 词类及其他术语| D 语意属性A 各类词网 Bilingual Wordnet (Bi-WN) 双语词网 Chinese Wordnet (CWN) 汉语词网 EuroWordNet (EWN) 欧语词网 WordNet (WN) 词网(特指Princeton WN) B 词义关系 antonym 【反义词】 antonymy反义关系 autoantonymy反义多义(关系) autohyponymy下位多义(关系) hypernym【上位词】泛称词 hypernymy上位关系 hyponym 【下位词】特指词 hyponymy 下位关系 holonym整体词 holonymy整体-部份关系 meronym部份词 meronymy部份-整体关系 metonym 转指词 metonymy 转指关系 near-synonym 近义词 near-synonymy 近义关系 polysemy 【多义性】 synonym 【同义词】 synonymy同义关系 taxonomy 分类架构 troponym方式词 troponymy方式关系 C 词类及其他术语 adjective 【形容词】 adverb 【副词】 agreement 【对谐】,一致性

algorithm 【算法/算法】 ambiguity 歧义 associations 关联 attributes 【属性】 auxiliary verbs 助动词 basic-level categories 基层范畴,底层范畴 buffers 【缓冲区】 case propagation 格位相沿,格位沿袭 categories 范畴 causative 【使动】 cause relation 因果关系 cause 原因 change-of-state verbs 易态动词 collocations 【连用语】 common nouns 普通名词 component-object meronyms组成部份(关系)compounds 复合词 concepts概念 conceptual semantic relation 概念语意关系concordances【关键词(前后文)排序】,汇编connectivity 连结性 constraints 【限制】 context 【语境】,上下文 co-occurrence 共现 count nouns 可数名词 cousins in hyponyms 特指亲属,下位亲属 data mining 数据挖掘 database 数据库 decomposition 分解 derived adverbs 衍生副词 descriptive adjectives 描述性形容词 determiners 限定符 dictionaries 辞典 disambiguation 排歧 distance in lexical trees 词汇树间距 domain-specific knowledge 特定领域知识,领域知识encyclopedic knowledge 百科全书知识,通识知识entail 蕴涵 entailment 【蕴涵】 entry 词条 euphemisms 委婉用法 exceptions 例外 factive叙实 familiarity index 熟悉度索引

基于统计词语关联度网络自动构建方法

基于统计词语关联度网络自动构建 方法 1引言 词语语义知识是众多的必要语言知识中一个重要的部分,它的丰富和完善对于计算机自然语言处理能力的提升具有重要的意义。目前较为成熟的语义词典在英语方面有WordNet[1]、FrameNet[2]、MindNet[3]等,汉语方面有How-Net[4]、同义词词林[5]等。这些语义词典从本质上可以看做概念以及概念之间各种关系的集合。它们均为人工开发,从开发到维护往往要耗费大量的人力和时间。自刘群[6]起,已有大量学者参与中文词语相似度技术的研究。目前被广泛研究与采用的两种方法是基于世界知识或某种分类体系的方法和基于统计的上下文向量空间模型方法。目前前者的研究更多一些。由于一些理论上以及运行条件的限制,现有的技术还存在很多问题,难以发

挥理想的效果。基于语义词典的词语相似度计算方法是一种基于语言学和人工智能的 理性主义方法,它利用语义词典,依据概念之间的上下位关系和同义关系,通过计算两个概念在树状概念层次体系中的距离来得 到词语间的相似度。这种方法存在以下几点不足:1)人类语言的词语具有很强的模糊性,一个词语往往有很多种词性、词义,应用语境也是丰富多变。以层次关系明确的关系结构作为知识表示框架并人工添加信息 很难表现模糊性的词语知识;2)词语语义知识复杂且含量巨大,只能由专业人员制定,进行知识密集的研究,希望全面细致地构建词典工作量是极为艰巨的,实际上目前的语义词典都还很不完备;3)规则的制定受人的主观影响比较大,不能准确反映客观现实;4)信息量固定,针对性较强,用户很难根据特定需要以及现实世界变化进行修改或 扩展;5)应用困难,对结构性的知识进行分析处理需要复杂的人工智能技术理论支 持以及大量的假设性强的人工规则制定,由于语言的模糊性,人工规则的假设实际上大

高中英语语法-冠词讲解及练习

冠词用法 一. 不定冠词的用法 1. 表泛指,表首次提到的、不限定的人或物。 There is a book on the desk. 2. a/an+ 单数n.表类别,指一类人或事物。 A horse is a useful animal. 3.表one, a certain, every, the same I’ll return in a day or two. A Mr. White is waiting for you downstairs. The doctor asked me to take the medicine three times a day. They are nearly of an age. 4.用在抽象名词前,a + 抽象名词,即抽象名词具体化。如: This little girl is a joy to her parents. It is a pleasure to talk with you. It is an honour to me to attend the meeting. a surprise/success/failure 5.与物质名词连用,表示“一种,一阵、一份”。 What a heavy rain! What a good supper! Please give me a black coffee! 6. 在同位语中,常用a/an+ n.(one) Mike and Lucy, a newly married couple, had a happy honeymoon in Paris. 7. 用在某些固定词组中. after a while 过了一会儿all of a sudden 突然as a rule 通常as a result 结果,因此as a matter of fact 事实上as a whole 大体上at a loss 不知所措in a hurry 急忙 in a way 在某种程度上in a word 总而言之put an end to… 结束…come to an end 结束come to a conclusion 得出结论have a good time 玩得愉快have a rest 休息一下have a cold 感冒have a word with 和…谈一谈make a living 谋生make a fire 生火make a fool of 愚弄take a walk a knowledge of, a understanding of, a collection of 8. 不定冠词的特殊位置 1)as/so/too/how/however+ adj.+ a/an+单数n. So short a time. Too long a distance. 2) what/such/half+ a/an+单数n. I have never seen such an animal. 二. 定冠词的用法 1.表特指的人或物 Look at the blackboard. There is a book on the desk. The book is an English book. 2. the+单数n.表类别 The horse is a useful animal. 3. 表世界上独一无二的事物,但若此类名词前有修饰成分,也可用a/an the sun, the earth, the moon, the sky, the universe a full moon but: in nature, in space, in society, in history

知网学习阶段总结--高梦娇

这段时间主要做的是对知网(Hownet)的全面了解,以及对本体建设工具protégé的简单应用。 通过对知网和WordNet的了解,我发现知网和WordNet的建设方式是截然不同的。知网对每个概念的定义是通过这个概念使用时的具体的语义环境来加以限制的。通过限制概念使用时所需要的其他语义来定义每个概念。而WordNet则是通过对词语的解释,以及概念的上下位关系,同义反义关系等关系来定义每个词语。学会使用protégé,可以用他来建设自己的词汇本体的时候,我觉得可以结合知网和WordNet,对每个概念的定义中,既要体现它的各种关系词,同时还要用每个概念使用时的具体的语义环境对概念加以限定。下一步的任务就是进一步学习protégé,掌握它的具体功能,争取尽快建立一种新的结合了知网思想和WordNet思想的词汇本体。 主要参考资料是知网中文版官方网站https://www.wendangku.net/doc/af16012532.html,/html/c_index.html,另外还有一些文章包括《知网简介》,《知网的理论发现》,《KDML-知网知识系统描述语言》,《建设中文词汇语义资源中的一些问题和我们的对策》,《一个基于概念的中文文本分类模型》,《基于知网的词汇语义相似度计算》,《WordNet与hownet之关系比较》等。 下面是我对知网和protégé应用的具体总结。 一.对知网的全面了解的总结 1.关于知网的基本介绍 知网是一种词汇本体。 知网(英文名称为Hownet)是一个以汉语和英语的词语所代表的概念为描述对象,以揭示概念与概念之间以及概念所具有的属性之间的关系为基本内容的 常识知识库。 义原是知网的最基本的构成单位,同时也是知网中不能分割的最小单位。 知网知识描述语言(KDML)是用来描述知网系统中所有概念的特定语言。 知网中每一个概念都是通过KDML,利用义原进行逐一的、孤立的定义的。 例如:以概念“打”的定义为例: W_C=打//中文词语 G_C=V //中文词语词性 E_C=~酱油,~张票,~饭,去~瓶酒,醋~来了//中文词语例子 W_E=buy //英文词语 G_E=V // 英文词语词性 E_E= //英文词语例子 DEF=buy|买// 概念定义 2.知网义原的选取 首先,知网中义原的选择和取舍是基于主观选择的。 义原是从4000个汉字的义项(一个词往往具有几个意义,每一个意义就是一个义项)中抽取,经过人工的观察,挑选,合并同类项最终得到的。知网中的 义原一共有2199个。 选取出来的义原又被分为实体类(如“human|人”),事件类(如“lose|失去”),属性类(如“form|形状”),属性值类(如“fragrant|香”)。 这部分知识可以通过《知网简介》和《知网的理论发现》进行进一步了解。 3.知网知识描述语言(KDML) KDML语言规则中详细规定了事件类、实体类、属性类、属性值类概念的描述方法,同时还规定了几种特定的标识符(如“,”“;”“=”)、几种特殊的指示 符号(如“~”“?”“$”)和各种动态角色(如“agent”“host of”)的使用方法。

高中英语冠词用法口诀

冠词分为:不定冠词和定冠词。 不定冠词有a和an两种:a用于辅音音素开头的词前,an用于元音音素开头的词前。定冠词: the 英语冠词用法口诀 一、定冠词的用法。 江河海洋与群山①,沙漠群岛海峡湾②。 阶级党派朝代名③,家族民族与报刊④。 厂矿机关农历节⑤,天体组织会议船⑥。 习语特指独有词⑦,普通名词专有含⑧。 计量单位洋乐器⑨,人的事物再次谈⑩。 方位次序最高级⑾,前面都需用定冠。 详细解释: ①江河、海洋与群山,如: the Changjiang River 长江 the Thames 泰晤士河 the Red Sea 红海the Baltic 波罗的海 the Pacific 太平洋the Alps阿尔卑斯山 the Tian-shan Mountains天山山脉 ②沙漠、群岛、海峡湾,指: - 沙漠:the Sahara Desert撒哈拉沙漠the Gobi戈壁 - 群岛: The Philippines 菲律宾群岛 the Balkan Peninsula巴尔士半岛 the British Isles不列颠群岛 - 海峡与海湾: the English Channel英伦海峡 the Taiwan Straits 台湾海峡 the Persian Gulf波斯湾 ③阶级、党派、朝代名,如: - 阶级: the working class工人阶级 the nobility 贵族阶级 - 党派: the Communist Party共产党 the Labour Partyl党 the Republican Party共和党 - 朝代: the Tang Dynasty唐朝 the Ming Dynasty明朝 the Stone Age石器时代 ④家族、民族、报刊,如: - 家族: The Browns get along well with their neighbours.布朗一家和邻居们相处得很好。 这里请注意:定冠词和姓氏的复数形式连用时,指“全家人”。如上例。 定冠词和姓氏的单数形式连用时,则指同一姓氏几个人中的某一个人。如: I’d like to see the Mr.Tom who teaches French.我想见一下教法语的那位汤姆先生 (意即还有从事其他工作的汤姆)。 - 民族:定冠词和表示民族的词连用时指“整个民族”。 The Chinese are a brave and hard working people.中华民族是一个勤劳勇敢的民族。 The English have a wonderful sense of humour.英国人十分富有幽默感。 - 报刊杂志名,如: the People’s Daily 人民日报the Daily Mail每日电讯报 the Times泰晤士报 the Atlantic大西洋日刊 ⑤厂矿、机关、农历节,指: - 厂矿,如: the Anshan Iron and Steel鞍山钢铁厂 the Beijing No.2 Textile Mill北京第二棉纺厂 - 机关,如: the Ministry of Education教育部 the State Department美国国务院 - 农历节,如: the Mid-Autumn Festival中秋节 the Spring Festival春节 the Dragon Boat Festival端午节 ⑥天体、组织、会议、船,指: - 天体,如: the Earth地球the Sun太阳the sky天空 - 组织,如: the Communist Youth League共青团 the Communist Party共产党 - 会议,如: the 11th Party Congress第十一次党代会 the Geneva Agreement日内瓦协议 - 船舶(也包括火车、飞机名称前),如: the Queen Mary玛丽女王号(船名) the Flying Scotsman苏格兰飞人号(快车) the Comet 彗星号(飞机名) ⑦习语、特指、独有词,指 - 习语,如: 在某些固定搭配中,有些习语要求用定冠词 习惯短语:in the morning, on the left, the day before yesterday, all the same

WordNet_的同义词典实现同义词检索(C#版)

同义词检索应该很多时候会用得上的,举个简单的例子,我们搜索关键字good 的时候,与well 和fine 等的词条也可能是你想要的结果。这里我们不自己建立同义词库,直接使用WordNet 的同义词库,本篇介绍C# 版的实现步骤,还会有续篇--Java 版。 由于Lucene 是发源于Java,所以C# 的应用者就没有Java 的那么幸福了,Java 版已经有3.0.2 可下载,C# 的版本还必须从SVN 库里:https://https://www.wendangku.net/doc/af16012532.html,/repos/asf/lucene/https://www.wendangku.net/doc/af16012532.html,/tags/https://www.wendangku.net/doc/af16012532.html,_2_9_2/ 才能取到最新的 2.9.2 的源码,二制包还只有 2.0 的。 接下来就是用VS 来编译它的,不多说。只是注意到在contrib 目录中有https://www.wendangku.net/doc/af16012532.html, 解决方案,这是我们想要的,编译https://www.wendangku.net/doc/af16012532.html, 可得到三个可执行文件: 1. Syns2Index.exe 用来根据WordNet 的同义词库建立同义词索引文件,同义词本身也是通过Lucene 来查询到的 2. SynLookup.exe 从同义词索引中查找某个词有哪些同义词 3. SynExpand.exe 与SynLookup 差不多,只是多了个权重值,大概就是同义程度 好啦,有了https://www.wendangku.net/doc/af16012532.html,.dll 和上面那三个文件,我们下面来说进一步的步骤: 二. 下载WordNet 的同义词库 可以从https://www.wendangku.net/doc/af16012532.html,/3.0/ 下载WNprolog-3.0.tar.gz 文件。然后解压到某个目录,如D:\WNprolog-3.0,其中子目录prolog 中有许多的pl 文件,下面要用到的就是wn_s.pl 三. 生成同义词Lucene 索引 使用命令 Syns2Index.exe d:\WNprolog-3.0\prolog\wn_s.pl syn_index 第二个参数是生成索引的目录,由它来帮你创建该目录,执行时间大约40 秒。这是顺利的时候,也许你也会根本无法成功,执行Syns2Index.exe 的时候出现下面的错误: Unhandled Exception: System.ArgumentException: maxBufferedDocs must at least be 2 when enabled at https://www.wendangku.net/doc/af16012532.html,.Index.IndexWriter.SetMaxBufferedDocs(Int32 maxBufferedDocs) at https://www.wendangku.net/doc/af16012532.html,.Syns2Index.Index(String indexDir, IDictionary word2Nums, IDictionary num2Words) at https://www.wendangku.net/doc/af16012532.html,.Syns2Index.Main(String[] args) 莫急,手中有源码,心里不用慌,只要找到Syns2Index 工程,改动Syns2Index.cs 文件中的

WordNet研究

基于WordNet重用的领域本体构建方法研究 摘要:构建本体是开发基于语义信息系统的重要步骤。为了提高构建领域本体的效率,提出了一种基于WordNet重用的领域本体构建方法。该方法分析了WordNet的结构和语义关系,将WordNet抽象为图模型,从中抽取以领域术语为节点的子图,得到一个领域子本体,再利用编辑工具对其进行修改和完善。通过分析与核对实验数据和结果,表明该方法可以重用WordNet的结构并从中获取领域知识,并半自动地快速构建领域本体。 关键词:WordNet; 重用; 领域本体; 语义; 图模型 Research of Building Domain Ontology Method Based On Reusing WordNet 【Abstract】Building ontology is an important process to develop semantic-based information system. For enhancing the efficiency of building domain ontology, an approach for building a domain ontology reusing WordNet was proposed. The approach analyzed the structure and semantic relations of WordNet and abstracted WordNet as a graph model. Regarding domain terms as the concepts of the ontology, a subgraph whose nodes were domain terms was abstracted and a domain sub-ontology was generated. The ontology was modified and complemented using an ontology editor. By means of analyzing and verifying the figures and results of the experiment, it shows that the structure of WordNet can be reused and domain knowledge is able to be acquired in this approach, and a domain ontology can be built semi-automatically and quickly. 【Key words】WordNet; reusing; domain ontology; semantic; graph model 1 概述 信息技术的知识化和智能化发展趋势,使得信息和数据的表示不只是停留在语法层面,更要聚焦到语义层面。而本体作为语义网的核心技术,它能够在语义层面上描述信息和数据的概念模型,因此为解决该类问题提供了一种良好的途径。在基于本体的应用中,构建本体是一项基本任务。然而现有的领域本体的构建方法基本上是人工处理,该类方法尽管本体概念和概念间的关系处理的比较准确,但是其构建效率不能满足当今信息技术发展的速度要求。因此,自动的领域本体构成方法成为了迫切需求。 本体(Ontology)是共享概念模型的明确的形式化规范说明。这包含四层含义:概念模型、明确、形式化和共享。“概念模型”指通过抽象出客观世界中一些现象的相关概念而得到的模型。“明确”指所使用的概念及使用这些概念的约束都有明确的定义。“形式化”指本体是计算机可读的。“共享”指本体所体现的是共同认可的知识,反映的是相关领域中公认的概念集。本体的目标是捕获相关领域的知识,提供对该领域知识的共同理解,确定该领域内共同认可的词汇,并从不同层次的形式化模式上给出这些词汇和词汇间相互关系的明确定义。本体的建立是一项非常繁重的工作。因此,如何快速建立本体成为一个热点研究问题。 目前主要有两种方法用于构建本体:第一种是基于数据挖掘的本体构建,第二种是重用现有本体来构建新本体,又分为全自动和半自动构建两种方法。要实现全自动构建本体是非常困难的。 尽管现有的自动领域本体构建的方法在构建效率上取得了一定的提高,但是其所构建领域本体中的概念及其关系由于所采用的技术性能差等原因导致准确度较差。针对以上不足,本文充分利用现有资源,对已提出的一种基于WordNet重用的领域本体构建方法进行研究。该方法将领域术语集看作领域本体中的概念,基于WordNet为源本体,分析其结构和语义关系,自动从WordNet中抽取出相关领域的本体,将其抽象为图模型,从中抽取以领域术语为节点的子图,得到一个领域子本体,再利用编辑工具对其进行修改和完善。实验表明该方法可以重用WordNet的结构,并从中获取领域知识,从而

相关文档