文档库 最新最全的文档下载
当前位置:文档库 › 1 Introduction A Robust Shallow Parser for Swedish

1 Introduction A Robust Shallow Parser for Swedish

1 Introduction A Robust Shallow Parser for Swedish
1 Introduction A Robust Shallow Parser for Swedish

A Robust Shallow Parser for Swedish

Ola Knutsson,Johnny Bigert,Viggo Kann

Numerical Analysis and Computer Science

Royal Institute of Technology,Sweden {knutsson,johnny,viggo}@nada.kth.se

1Introduction

In many NLP-applications,the robustness of the in-ternal modules of an application is a prerequisite for the success and usability of the system.The term robustness is a bit unclear and vague,but in NLP,it is often used in the sense robust against noisy,ill-formed,and partial natural language data.The full spectrum of robustness is de?ned by Menzel(1995), and further explored according to parsing in(Basili and Zanzotto,2002).In the following,we will fo-cus on a parser developed for robustness against ill-formed and partial data,called Granska Text Ana-lyzer(GTA).

2Parsers for Swedish

Most parsers for Swedish are surface oriented,and designed for unrestricted text.Early initiatives on parsing Swedish focused on the usage of heurist-ics(Brodda,1983)and surface information as in the Morp Parser(K¨a llgren,1991).The Morp was also designed for parsing using very limited lexical knowledge.

A more full syntactic analysis is accomplished by the Uppsala Chart Parser(UCP)(S?a gvall Hein, 1982).UCP has been used in several applications, for instance in machine translation(S?a gvall Hein et al.,2002).

Two other parsers,recently developed,are one that uses machine learning(Megyesi,2002)and one that is based on?nite-state cascades,called Cass-Swe(Kokkinakis and Johansson-Kokkinakis,1999). Notably is that Cass-Swe also assigns functional in-formation to constituents.

There is also a deep parser developed in the CLE framework(Gamb¨a ck,1997).The deep nature of this parser limits its coverage.

Furthermore,two other parsers identify de-pendency structure using Constraint Grammar (Birn,1998)and Functional Dependency Grammar (V outilainen,2001).These two parsers are also commercialised.

3A Robust Shallow Parser for Swedish GTA is rule-based and relies on handcrafted rules written in a formalism with a context-free backbone (Domeij et al.,2000).The parser selects grammar rules top-down and uses a passive chart.The rules in the grammar are applied on Part-of-Speech tagged text,either from an integrated tagger or from an ex-ternal source.GTA identi?es constituents and as-signs phrase labels.However,no full trees with a top node are built.In total GTA contains260rules. About200of them identi?es phrases.

The analysis is surface-oriented and identi?es many types of phrases in Swedish.The basic phrase types are adverbial phrases(ADVP),adject-ive phrases(AP),in?nite verb phrases(INFP),noun phrases(NP),prepositional phrases(PP)and verb phrases(limited)and chains(VP and VC).The in-ternal structure of the phrases is parsed when appro-priate and the heads of the phrases are also identi-?ed.The parser does not include a mechanism for resolving PP-attachments.The disambiguation of phrase boundaries is primarily done within the rules, and secondly using heuristic selection and disambig-uation rules using the longest matching criteria.

In addition to the parsing of phrase structure,

Vi NPB CLB

har VCB CLI inga NPB CLI pengar NPI CLI och O CLB

vi NPB CLI kan VCB CLI inte ADVPB|VCI CLI finansiera VCI CLI

v?a r NPB CLI verksamhet NPI CLI utan PPB CLI kraftfulla APB|NPB|PPI CLI besparingar NPI|PPI CLI

,O CLB

h¨a vdar VCB CLI han NPB CLI och0CLB

f?a r VCB CLI st¨o d NPB CLI

av0CLI Landstingsf¨o rbundets NPB CLI ekonomidirekt¨o r NPB CLI

Ulf NPI CLI Wetterberg NPI CLI

:0CLI Figure1:Example sentence showing the IOB format.

clause boundaries(CLB)are detected,resembling Ejerhed’s algorithm for clause boundary detection (Ejerhed,1999).Ejerhed’s rules for clause bound-ary detection are implemented in a straightforward manner following the patterns pointed out in Ejer-hed’s paper.A few new rules have been developed. Totally,20rules for clause boundary detection are used in the parser.

The output from the parser is given in the so-called IOB format(Ramshaw and Marcus,1995). Each word is assigned with a tag containing two parts,part one gives the phrase category(e.g.NP) and part two is one of three tags:I(for Inside a phrase),O(for Outside a phrase)and B(for Begin-ning a phrase).The?rst word of an NP will for in-stance have the tag NPB(See Figure1for a sentence with phrase labels and clause boundaries in the IOB format.

3.1Robustness against ill-formed and

fragmentary natural language data

The parser was designed for robustness against ill-formed and fragmentary sentences.One task for the parser is to analyze text from second language learners and other text types which include different kinds of errors.

The parser is not facilitied with relaxation tech-niques,which is convenient in many systems(see e.g.(Jensen,1993)),instead the design of the parser follows the lines in the design of Con-straint Grammar parsing(Karlsson et al.,1995)and also Functional Dependency parsing(J¨a rvinen and Tapanainen,1997)–the question of grammatical-ity is not dealt with within the parser.Grammatic-ality is more used as a reason for the selection of one interpretation prior to another.In addition to the noise in textual data,there is also a rich source for errors from the internal modules of the parsing sys-tem,e.g.tokenization and tagging errors.Robust parsers most handle these internal errors,or at least degrade gracefully.

As an example,agreement is not considered in noun phrases and predicative constructions (Swedish has a constraint on agreement in these con-structions).By avoiding the constraint for agree-ment,the parser will not fail due to textual errors or tagging errors.In other words,the parser does not decide about the grammaticality in such construc-tions.Tagging errors that do not concern agreement are to some extent handled using a set of tag correc-tion rules based on heuristics on common tagging errors.

4Evaluation

The parser has been evaluated on15000words from the SUC corpus.Five text genres were used.In the absence of a Swedish treebank annotated with constituency trees,the texts were manually annot-ated with constituency structure,without top-nodes, based on the output from the parser.However,the manual annotation is more homogenous across the phrase types than the output of GTA.This means that there are systematic errors in the output from the parser.The evaluation results are therefore cal-culated on the untuned output from the parser.The accuracy on the phrase structure task is88.7per cent and the F-score for the clause boundary detection is 88.2per cent.The parser seem to work best on PPs, APs,VCs and NPs.Adverbial and in?nitive verb phrases is identi?ed with a lower accuracy.

The robustness and the ef?ciency of the parser is good.GTA has already successfully been used in a statistical grammar checker(Bigert and Knutsson, 2002).In the near future it will be used in a rule based grammar checker for users with Swedish as a second language.

References

R.Basili and F.M.Zanzotto.2002.Parsing engineering and empirical robustness.Natural Language Engin-eering,8(2–3):97–120.

J.Bigert and O.Knutsson.2002.Robust error detection:

A hybrid approach combining unsupervised error de-

tection and linguistic knowledge.In Proc.2nd Work-shop Robust Methods in Analysis of Natural language Data(ROMAND’02),Frascati,Italy,pages10–19.

J.Birn.1998.Swedish constraint grammar.Technical report,Lingsoft Inc,Helsinki,Finland.

B.Brodda.1983.An experiment with heuristic pars-

ing of Swedish.In Proc.of First Conference of the European Chapter of the Association for Compu-tatlona Linguistics,pages66–73,Pisa,Italy.

R.Domeij,O.Knutsson,J.Carlberger,and V.Kann.

2000.Granska–an ef?cient hybrid system for Swedish grammar checking.In T.Nordg?a rd,editor, Proc.of12th Nordic Conference on Computational Linguistics(Nodalida-01),pages28–40.Department of Linguistics,University of Trondheim.

E.Ejerhed.1999.Finite state segmentation of dis-

course into clauses.In A.Kornai,editor,Extended Fi-nite State Models of Language,chapter13.Cambridge University Press.

B.Gamb¨a ck.1997.Processing Swedish Sentences:A

Uni?cation-Based Grammar and some Applications.

Ph.D.thesis,The Royal Institute of Technology and Stockholm University.

T.J¨a rvinen and P.Tapanainen.1997.A dependency parser for English.Technical report,Department of Linguistics,University of Helsinki.

K.Jensen.1993.PEG:The PLNLP English grammar.

In K.Jensen,G.E.Heidorn,and S.D.Richardson,ed-itors,Natural Language Processing:The PLNLP Ap-proach,pages29–43.Kluwer,Bosten,USA.

G.K¨a llgren.1991.Parsing without lexicon:the morp

system.In Proc.Fifth Conference of the European Chapter of the Association for Computational Lin-guistics,pages143–148,Berlin,Germany.F.Karlsson,A.V outilainen,J.Heikkil¨a,and A.Anttila.

1995.Constraint Grammar.A Language Independ-ent System for Parsing Unrestricted text.Mouton de Gruyter,Berlin,Germany.

D.Kokkinakis and S.Johansson-Kokkinakis.1999.A

cascaded?nite-state parser for syntactic analysis of swedish.In Proc.9th European Chapter of the As-sociation of Computational Linguistics(EACL),pages 245–248,Bergen,Norway.Association for Computa-tional Linguistics.

B.Megyesi.2002.Shallow parsing with PoS taggers

and linguistic features.J.Machine Learning Research, Special Issue on Shallow Parsing(2):639–668.

W.Menzel.1995.Robust processing of natural lan-guage.In Proc.19th Annual German Conference on Arti?cial Intelligence,pages19–34,Berlin.Springer. L.Ramshaw and M.Marcus.1995.Text chunking us-ing transformation-based learning.In David Yarovsky and Kenneth Church,editors,Proc.Third Workshop on Very Large Corpora,pages82–94,Somerset,New Jersey.Association for Computational Linguistics. A.S?a gvall Hein,A.Almqvist,E.Forsbom,J.Tiedemann,

P.Weijnitz,L.Olsson,and S.Thaning.2002.Scal-ing up an mt prototype for industrial use.Databases and data?ow.In Proc.Third International Confer-ence on Language Resources and Evaluation(LREC 2002),pages1759–1766,Las Palmas,Spain.

A.S?a gvall Hein.1982.An experimental parser.In

Proceedings of Proceedings of the Ninth International Conference on Computational Linguistics(Coling82), pages121–126,Prague.

A.V outilainen.2001.Parsing Swedish.In Proc.

13th Nordic Conference on Computational Linguistics (Nodalida-01),Uppsala,Sweden.

1 Introduction On

On choice-o?ering imperatives Maria Aloni? 1Introduction The law of propositional logic that states the deducibility of either A or B from A is not valid for imperatives(Ross’s paradox,cf.[9]).The command (or request,advice,etc.)in(1a)does not imply(1a)(unless it is taken in its alternative-presenting sense),otherwise when told the former,I would be justi?ed in burning the letter rather then posting it. (1) a.Post this letter!? b.Post this letter or burn it! Intuitively the most natural interpretation of the second imperative is as one presenting a choice between two actions.Following[2](and[6])I call these choice-o?ering imperatives.Another example of a choice-o?ering imperative is (2)with an occurence of Free Choice‘any’which,interestingly,is licensed in this context. (2)Take any card! Like(1a),this imperative should be interpreted as carrying with it a permission that explicates the fact that a choice is being o?ered. Possibility statements behave similarly(see[8]).Sentence(3b)has a read-ing under which it cannot be deduced from(3a),and‘any’is licensed in(4). (3) a.You may post this letter.? b.You may post this letter or burn it. (4)You may take any card. In[1]I presented an analysis of modal expressions which explains the phe-nomena in(3)and(4).That analysis maintains a standard treatment of‘or’as logical disjunction(contra[11])and a Kadmon&Landman style analysis of‘any’as existential quanti?er(contra[3]and[4])assuming,however,an in-dependently motivated‘Hamblin analysis’for∨and?as introducing sets of alternative propositions.Modal expressions are treated as operators over sets of propositional alternatives.In this way,since their interpretation can depend on the alternatives introduced by‘or’(∨)or‘any’(?)in their scope,we can account for the free choice e?ect which arises in sentences like(3b)or(4).In this article I would like to extend this analysis to imperatives.The resulting theory will allow a uni?ed account of the phenomena in(1)-(4).We will start by presenting our‘alternative’analysis for inde?nites and disjunction. ?ILLC-Department of Philosophy,University of Amsterdam,NL,e-mail:M.D.Aloni@uva.nl

1.introduction

Introdution

Mike Jian

INTRODUCTION ?Section A: ?Comprises 8 two mark and 4 one mark multiple choice questions. ?Section B: ?Four 10 mark questions. ?Two 20 mark questions.

INTRODUCTION The examination is a three hour paper with 15 minutes reading and planning time. All questions are compulsory. Some questions will adopt a scenario/case study approach. All those questions will require some form of written response although questions on planning or review may require the calculation and interpretation of some basic ratios.

1.Which TWO of the following should be included in an audit engagement letter? ①Objective and scope of the audit ②Results of previous audits ③Management’s responsibilities ④Need to maintain professional scepticism A.① and ② B.① and ③ C.② and ④ D.③ and ④ (2 marks)

外文翻译关于Linux的介绍(Introduction to Linux)

毕业设计说明书 英文文献及中文翻译 学 专 指导教师: 2014 年 6 月

Introduction to Linux 1.1 History 1.1.1 UNIX In order to understand the popularity of Linux, we need to travel back in time, ab out 30 years ago... Imagine computers as big as houses, even stadiums. While the sizes of those com puters posed substantial problems, there was one thing that made this even worse: eve ry computer had a different operating system. Software was always customized to ser ve a specific purpose, and software for one given system didn't run on another system. Being able to work with one system didn't automatically mean that you could work w ith another. It was difficult, both for the users and the system administrators. Computers were extremely expensive then, and sacrifices had to be made even after th e original purchase just to get the users to understand how they worked. The total cost of IT was enormous. Technologically the world was not quite that advanced, so they had to live with t he size for another decade. In 1969, a team of developers in the Bell Labs laboratories started working on a solution for the software problem, to address these compatibility issues. They developed a new operating system, which was simple and elegant written in the C programming language instead of in assembly code able to recycle code. The Bell Labs developers named their project "UNIX." The code recycling features were very important. Until then, all commercially av ailable computer systems were written in a code specifically developed for one system . UNIX on the other hand needed only a small piece of that special code, which is now commonly named the kernel. This kernel is the only piece of code that needs to be ad apted for every specific system and forms the base of the UNIX system. The operating system and all other functions were built around this kernel and written in a higher pr ogramming language, C. This language was especially developed for creating the UNI

自我介绍(self-introduction)

自我介绍(self-introduction) ??? 1. Good morning. I am glad to be here for this interview. First let me introduce myself. My name is ***, 24. I come from ******,the capital of *******Province. I graduated from the ******* department of *****University in July ,2001.In the past two years I have been preparing for the postgraduate examination while I have been teaching *****in NO.****middle School and I was a head-teacher of a class in junior grade two. Now all my hard work has got a result since I have a chance to be interview by you . I am open-minded ,quick in thought and very fond of history.In my spare time,I have broad interests like many other youngsters.I like reading books, especially those about *******.Frequently I exchange with other people by making comments in the forum on line.

加拿大介绍Canada Introduction

Canada Introduction Canada has a population just less than 30 million people in a country twice the area of the United States. The heritage of Canada was French and English; however, significant immigration from Asia and Europe's non-French and English countries has broadened Canada's cultural richness. This cultural diversity is considered a national asset, and the Constitution Act prohibits discrimination against individual citizens on the basis of race, color, religion, or sex. The great majority of Canadians are Christian. Although the predominant language in Canada is English, there are at least three varieties of French that are recognized: Quebecois in Quebec, Franco-Manitoban throughout Manitoba and particularly in the St. Boniface area of Winnipeg, and Acadian. The Italian language is a strong third due to a great influx of Italian immigrants following W.W.II. Canada's three major cities are distinctively, even fiercely different from one another even though each is a commercially thriving metropolitan center. Montreal, established in the 17th century and the largest French city outside France, has a strong influence of French architecture and culture. It is a financial and manufacturing center

英语口语集锦-介绍(introduction)

英语口语集锦-介绍(introduction) making introductions 给人作介绍 1. jane, tom. tom, jane. 2. jane, this is tom, tom, this is jane. 3. jane, i’d like you to meet my friend tom. 4. jane, have you met tom? 5. jane, do you know tom? 6. look, tom’s he re. tome, come and meet jane. 7. jane, this is tom. he’s a friend from college. 8. jane, tom is the guy i was telling you about. 9. do you know each other? 10. have you two met ? 11. have you two been introduced? 12. allow me to introduce professor linda ferguson of harvard university. 13. let me introduce our guest of honor, mr.david morris. 14. if you want to be introduced to the author, i think i can arrange it.

making a self-introduction 作自我介绍 1. may i introduce myself 2. hello, i’m hanson smith. 3. excuse me, i don’t think we’ve met. my name’s hanson smith. 4. how do you do? i’m hanson smith. 5. i’m david anderson. i don’t believe i’ve had the pleasure. 6. first let me introduce myself. i’m peter white, production manager. 7. my name is david. i work in the marketing department. after being introduced. 被介绍与对方认识后. 1. i’m glad to meet you. 很高兴认识你. 2. nice meeting you. 很高兴认识你. (平时用得最多的是nice to meet you ) 3. how nice to meet you. 认识你真高兴. 4. i’ve heard so much about you. 我知道很多关于你的事儿. 5. helen has told me all about you. 海伦对我将了好多你的事儿. 6. i’ve been wanting to meet you for some time.

英语自我介绍(self-introduction)模板

英语自我介绍例文模板: Sample1 My name is ________. I am graduate from ________ senior high school and major in ________. There are ________ people in my family. My father works in a computer company. And my mother is a housewife. I am the youngest one in my family. In my spare time, I like to read novels. I think reading could enlarge my knowledge. As for novels, I could imagine whatever I like such as a well-known scientist or a kung-fu master. In addition to reading, I also like to play PC games. A lot of grownups think playing PC games hinders the students from learning. But I think PC games could motivate me to learn something such as English or Japanese. My favorite course is English because I think it is interesting to say one thing via different sounds. I wish my English could be improved in the next four years and be able to speak fluent English in the future. Sample2: I am . I was born in . I graduate from senior high school and major in English. I started learning English since I was 12 years old. My parents ha ve a lot of American friends. That’s why I have no problem communicating with Americans or others by speaking English. In my spare time, I like to do anything relating to English such as listening to English songs, watching English movies or TV programs, or even attending the activities held by some English clubs or institutes. I used to go abroad for a short- term English study. During that time, I learned a lot of daily life English and saw a lot of different things. I think language is very interesting. I could express one substance by using different sounds. So I wish I could study and read more English literatures and enlarge my knowledge. Sample3: My name is . There are 4 people in my family. My father is a Chemistry teacher. He teaches chemistry in senior high school. My mother is an English teacher. She teaches English in the university. I have a younger brother, he is a junior high school student and is preparing for the entrance exam. I like to read English story books in my free time. Sometimes I surf the Internet and download the E- books to read. Reading E- books is fun. In addition, it also enlarges my vocabulary words because of the advanced technology and the vivid animations. I hope to study both English and computer technology because I am interested in both of the subjects. Maybe one day I could combine both of them and apply to my research in the future. Sample4: My name is . I am from . There are people in my family. My father works in a computer company. He is a computer engineer. My mother works in a international trade company. She is also a busy woman. I have a older sister and a younger brother. My sister is a junior in National Taiwan University. She majors in

用英语Introduction 介绍

Introduction 介绍 Making introductions 给人作介绍 1. Jane, Tom. Tom, Jane. 2. Jane, this is Tom, Tom, this is Jane. 3. Jane, I'd like you to meet my friend Tom. 4. Jane, have you met Tom? 5. Jane, do you know Tom? 6. Look, Tom's here. Tome, come and meet Jane. 7. Jane, this is Tom. He's a friend from college. 8. Jane, Tom is the guy I was telling you about. 9. Do you know each other? 10. Have you two met ? 11. Have you two been introduced? 12. Allow me to introduce Professor Linda Ferguson of Harvard University. 13. Let me introduce our guest of honor, Mr.David Morris. 14. If you want to be introduced to the author, I think I can arrange it. Making a self-introduction 作自我介绍 1. May I introduce myself 2. Hello, I'm Hanson Smith. 3. Excuse me, I don't think we've met. My name's Hanson Smith. 4. How do you do? I'm Hanson Smith. 5. I'm David Anderson. I don't believe I've had the pleasure. 6. First let me introduce myself. I'm Peter White, production manager. 7. My name is David. I work in the marketing department. After being introduced. 被介绍与对方认识后 1. I'm glad to meet you. 很高兴认识你。 2. Nice meeting you. 很高兴认识你。 3. How nice to meet you. 认识你真高兴。 4. I've heard so much about you. 我知道很多关于你的事儿。 5. Helen has told me all about you.

希腊罗马神话1. Introduction

1 Introduction Greco-Roman mythology is the cultural reception of myths from the ancient Greeks and Romans. Along with philosophy and political thought, mythology represents one of the major survivals of classical antiquity throughout later Western culture. Greek mythology is the body of myths and legends belonging to the ancient Greeks, concerning their gods and heroes, the nature of the world, and the origins and significance of their own cult and ritual practices. They were a part of religion in ancient Greece and are part of religion in modern Greece and around the world as Hellenismos. Modern scholars refer to, and study the myths in an attempt to throw light on the religious and political institutions of Ancient Greece, its civilization, and to gain understanding of the nature of myth-making itself. Roman mythology is the combination of the beliefs, the rituals, and the observances of supernatural occurrences by the ancient Romans from early periods until Christianity finally completely replaced the native religions of the Roman Empire. The religion of the early Romans was so changed by the addition of numerous and conflicting beliefs in later times, and by the assimilation of a vast amount of Greek mythology, that it cannot be ever reconstructed precisely. This was because of the extensive changes in the religion before the literary tradition began. Most of the Greek deities were adopted by the Romans, although in many cases there was a change of name. Much of what became Roman mythology was borrowed from Greek mythology at a later date, as Greek gods were associated with their Roman counterparts. Greek mythology is embodied, explicitly, in a large collection of narratives, and implicitly in Greek representational arts, such as vase-paintings and votive gifts. Greek myth attempts to explain the origins of the world, and details the lives and adventures of a wide variety of gods, goddesses, heroes, heroines, and mythological creatures. These accounts initially were disseminated in an oral-poetic tradition; today the Greek myths are known primarily from Greek literature. The oldest known Greek literary sources, the epic poems Iliad and Odyssey, focus on events surrounding the Trojan War. Two poems by Homer's near contemporary Hesiod, the Theogony and the Works and Days, contain accounts of the genesis of the world, the

Introduction 介绍

Introduction 介绍 一、Introducing Each other 介绍相识 高频语句 自我介绍 1.May I introduce myself to you? 我可以作自我介绍吗? 2.Did you meet before? 我们见过面吗? 3.Allow me to introduce myself. 请允许我作个自我介绍。 4.Hello, my name is Bill. 你好,我叫比尔。 5.Can you just introduce yourself to the other guests? 您向其他客人自我介绍一下, 好吗? 6.Are you Mr. Smith? 你是史密斯先生吗? 7.Do you mind if I join you? 我加入你们当中来,介意吗? 8.Here is my card. 这是我的命。 9.It’s really an honor for me to meet you. 真的很荣幸认识你。 10.This is the first time we have met. 这是我们第一次见面。 介绍同事 1.I’d like you to meet Mary, my colleague. 我介绍你们认识玛丽,我的同事。 2.Will you introduce me to that lady? 把我介绍给那位女士认识一下,好吗? 3.I don’t think you have known each other. 我想你们俩还互不认识吧。 4.Just go in and meet everyone. 进去和大家认识一下。 5.May I introduce Mr. Chen?让介绍一下陈先生好吗? 相互寒暄 1.We have been looking forward to meeting you. 我们一直盼望着见到您。 2.I’m delighted to know you. 很高兴认识你。 3.Is this your first visit to Shanghai? 这是您第一次来上海? 4.I can show you around while you’re here. 您在此逗留期间我可以带着您到处 走走。 5.Mr. Li has told me all about you. 李先生对我讲了好多你的事儿。 二、Products Introduction 产品介绍

相关文档
相关文档 最新文档