ImageVerifierCode 换一换
格式:PPT , 页数:110 ,大小:801.50KB ,
资源ID:378740      下载积分:2000 积分
快捷下载
登录下载
邮箱/手机:
温馨提示:
如需开发票,请勿充值!快捷下载时,用户名和密码都是您填写的邮箱或者手机号,方便查询和重复下载(系统自动生成)。
如填写123,账号就是123,密码也是123。
特别说明:
请自助下载,系统不会自动发送文件的哦; 如果您已付费,想二次下载,请登录后访问:我的下载记录
支付方式: 支付宝扫码支付 微信扫码支付   
注意:如需开发票,请勿充值!
验证码:   换一换

加入VIP,免费下载
 

温馨提示:由于个人手机设置不同,如果发现不能下载,请复制以下地址【http://www.mydoc123.com/d-378740.html】到电脑端继续下载(重复下载不扣费)。

已注册用户请登录:
账号:
密码:
验证码:   换一换
  忘记密码?
三方登录: 微信登录  

下载须知

1: 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。
2: 试题试卷类文档,如果标题没有明确说明有答案则都视为没有答案,请知晓。
3: 文件的所有权益归上传用户所有。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 本站仅提供交流平台,并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

版权提示 | 免责声明

本文(Automated Text summarization Tutorial COLING-ACL'98.ppt)为本站会员(testyield361)主动上传,麦多课文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知麦多课文库(发送邮件至master@mydoc123.com或直接QQ联系客服),我们立即给予删除!

Automated Text summarization Tutorial COLING-ACL'98.ppt

1、1,Automated Text summarization Tutorial COLING/ACL98,Eduard Hovy and Daniel Marcu Information Sciences Institute University of Southern California 4676 Admiralty Way, Suite 1001 Marina del Rey, CA 90292 hovy,marcuisi.edu http:/www.isi.edu/natural-language/people/hovy.html,marcu.html,2,an exciting ch

2、allenge.,.put a book on the scanner, turn the dial to 2 pages, and read the result.download 1000 documents from the web, send them to the summarizer, and select the best ones by reading the summaries of the clusters. .forward the Japanese email to the summarizer, select 1 par, and skim the translate

3、d summary.,3,Headline news informing,4,TV-GUIDES decision making,5,Abstracts of papers time saving,6,Graphical maps orienting,7,Textual Directions planning,8,Cliff notes Laziness support,9,Real systems Money making,10,Questions,What kinds of summaries do people want? What are summarizing, abstractin

4、g, gisting,.? How sophisticated must summ. systems be? Are statistical techniques sufficient? Or do we need symbolic techniques and deep understanding as well? What milestones would mark quantum leaps in summarization theory and practice? How do we measure summarization quality?,11,Table of contents

5、,1. Motivation. 2. Genres and types of summaries. 3. Approaches and paradigms. 4. Summarization methods (exercise). 5. Evaluating summaries. 6. The future.,12,Genres of Summary?,Indicative vs. informative .used for quick categorization vs. content processing. Extract vs. abstract .lists fragments of

6、 text vs. re-phrases content coherently. Generic vs. query-oriented .provides authors view vs. reflects users interest. Background vs. just-the-news .assumes readers prior knowledge is poor vs. up-to-date. Single-document vs. multi-document source .based on one text vs. fuses together many texts.,13

7、,Examples of Genres,Exercise: summarize the following texts for the following readers:,14,90 Soldiers Arrested After Coup Attempt In Tribal Homeland MMABATHO, South Africa (AP) About 90 soldiers have been arrested and face possible death sentences stemming from a coup attempt in Bophuthatswana, lead

8、ers of the tribal homeland said Friday.Rebel soldiers staged the takeover bid Wednesday, detaining homeland President Lucas Mangope and several top Cabinet officials for 15 hours before South African soldiers and police rushed to the homeland, rescuing the leaders and restoring them to power.At leas

9、t three soldiers and two civilians died in the uprising.Bophuthatswanas Minister of Justice G. Godfrey Mothibe told a news conference that those arrested have been charged with high treason and if convicted could be sentenced to death. He said the accused were to appear in court Monday.All those arr

10、ested in the coup attempt have been described as young troops, the most senior being a warrant officer.During the coup rebel soldiers installed as head of state Rocky Malebane-Metsing, leader of the opposition Progressive Peoples Party.Malebane-Metsing escaped capture and his whereabouts remained un

11、known, officials said. Several unsubstantiated reports said he fled to nearby Botswana.Warrant Officer M.T.F. Phiri, described by Mangope as one of the coup leaders, was arrested Friday in Mmabatho, capital of the nominally independent homeland, officials said.Bophuthatswana, which has a population

12、of 1.7 million spread over seven separate land blocks, is one of 10 tribal homelands in South Africa. About half of South Africas 26 million blacks live in the homelands, none of which are recognized internationally.Hennie Riekert, the homelands defense minister, said South African troops were to re

13、main in Bophuthatswana but will not become a permanent presence.Bophuthatswanas Foreign Minister Solomon Rathebe defended South Africas intervention.The fact that . the South African government (was invited) to assist in this drama is not anything new nor peculiar to Bophuthatswana, Rathebe said. Bu

14、t why South Africa, one might ask? Because she is the only country with whom Bophuthatswana enjoys diplomatic relations and has formal agreements.Mangope described the mutual defense treaty between the homeland and South Africa as similar to the NATO agreement, referring to the Atlantic military all

15、iance. He did not elaborate.Asked about the causes of the coup, Mangope said, We granted people freedom perhaps . to the extent of planning a thing like this.The uprising began around 2 a.m. Wednesday when rebel soldiers took Mangope and his top ministers from their homes to the national sports stad

16、ium.On Wednesday evening, South African soldiers and police stormed the stadium, rescuing Mangope and his Cabinet.South African President P.W. Botha and three of his Cabinet ministers flew to Mmabatho late Wednesday and met with Mangope, the homelands only president since it was declared independent

17、 in 1977.The South African government has said, without producing evidence, that the outlawed African National Congress may be linked to the coup.The ANC, based in Lusaka, Zambia, dismissed the claims and said South Africas actions showed that it maintains tight control over the homeland governments

18、. The group seeks to topple the Pretoria government.The African National Congress and other anti-government organizations consider the homelands part of an apartheid system designed to fragment the black majority and deny them political rights in South Africa.,15,If You Give a Mouse a Cookie Laura J

19、offe Numeroff 1985If you give a mouse a cookie,hes going to ask for a glass of milk. When you give him the milk, hell probably ask you for a straw. When hes finished, hell ask for a napkin. Then hell want to look in the mirror to make sure he doesnt have a milk mustache. When he looks into the mirro

20、r, he might notice his hair needs a trim. So hell probably ask for a pair of nail scissors. When hes finished giving himself a trim, hell want a broom to sweep up. Hell start sweeping. He might get carried away and sweep every room in the house. He may even end up washing the floors as well. When he

21、s done, hell probably want to take a nap. Youll have to fix up a little box for him with a blanket and a pillow. Hell crawl in, make himself comfortable, and fluff the pillow a few times. Hell probably ask you to read him a story. When you read to him from one of your picture books, hell ask to see

22、the pictures. When he looks at the pictures, hell get so excited that hell want to draw one of his own. Hell ask for paper and crayons. Hell draw a picture. When the picture is finished, hell want to sign his name, with a pen. Then hell want to hang his picture on your refrigerator. Which means hell

23、 need Scotch tape. Hell hang up his drawing and stand back to look at it. Looking at the refrigerator will remind him that hes thirsty. Sohell ask for a glass of milk. And chances are that if he asks for a glass of milk, hes going to want a cookie to go with it.,16,Aspects that Describe Summaries,In

24、put (Sparck Jones 97) subject type: domain genre: newspaper articles, editorials, letters, reports. form: regular text structure; free-form source size: single doc; multiple docs (few; many) Purpose situation: embedded in larger system (MT, IR) or not? audience: focused or general usage: IR, sorting

25、, skimming. Output completeness: include all aspects, or focus on some? format: paragraph, table, etc. style: informative, indicative, aggregative, critical.,17,Table of contents,1. Motivation. 2. Genres and types of summaries. 3. Approaches and paradigms. 4. Summarization methods (exercise). 5. Eva

26、luating summaries. 6. The future.,18,Making Sense of it All.,To understand summarization, it helps to consider several perspectives simultaneously: 1. Approaches: basic starting point, angle of attack, core focus question(s): psycholinguistics, text linguistics, computation. 2. Paradigms: theoretica

27、l stance; methodological preferences: rules, statistics, NLP, Info Retrieval, AI. 3. Methods: the nuts and bolts: modules, algorithms, processing: word frequency, sentence position, concept generalization.,19,Psycholinguistic Approach: 2 Studies,Coarse-grained summarization protocols from profession

28、al summarizers (Kintsch and van Dijk, 78): Delete material that is trivial or redundant. Use superordinate concepts and actions. Select or invent topic sentence. 552 finely-grained summarization strategies from professional summarizers (Endres-Niggemeyer, 98): Self control: make yourself feel comfor

29、table. Processing: produce a unit as soon as you have enough data. Info organization: use “Discussion” section to check results. Content selection: the table of contents is relevant.,20,Computational Approach: Basics,Top-Down: I know what I want! dont confuse me with drivel! User needs: only certain

30、 types of info System needs: particular criteria of interest, used to focus search,Bottom-Up: Im dead curious: whats in the text?User needs: anything thats important System needs: generic importance metrics, used to rate content,21,Query-Driven vs. Text-DRIVEN Focus,Top-down: Query-driven focus Crit

31、eria of interest encoded as search specs. System uses specs to filter or analyze text portions. Examples: templates with slots with semantic characteristics; termlists of important terms. Bottom-up: Text-driven focus Generic importance metrics encoded as strategies. System applies strategies over re

32、p of whole text. Examples: degree of connectedness in semantic graphs; frequency of occurrence of tokens.,22,Bottom-Up, using Info. Retrieval,IR task: Given a query, find the relevant document(s) from a large set of documents. Summ-IR task: Given a query, find the relevant passage(s) from a set of p

33、assages (i.e., from one or more documents).,Questions: 1. IR techniques work on large volumes of data; can they scale down accurately enough? 2. IR works on words; do abstracts require abstract representations?,23,Top-Down, using Info. Extraction,IE task: Given a template and a text, find all the in

34、formation relevant to each slot of the template and fill it in. Summ-IE task: Given a query, select the best template, fill it in, and generate the contents.,Questions: 1. IE works only for very particular templates; can it scale up? 2. What about information that doesnt fit into any templateis this

35、 a generic limitation of IE?,xx xxx xxxx x xx xxxx xxx xx xxx xx xxxxx x xxx xx xxx xx x xxx xx xx xxx x xxx xx xxx x xx x xxxx xxxx xxxx xx xx xxxx xxx xxx xx xx xxxx x xxx xx x xx xx xxxxx x x xx xxx xxxxxx xxxxxx x x xxxxxxx xx x xxxxxx xxxx xx xx xxxxx xxx xx x xx xx xxxx xxx xxxx xx xxxxx xxxxx

36、 xx xxx x xxxxx xxx,Xxxxx: xxxx Xxx: xxxx Xxx: xx xxx Xx: xxxxx x Xxx: xx xxx Xx: x xxx xx Xx: xxx x Xxx: xx Xxx: x,24,Paradigms: NLP/IE vs. ir/statistics,25,Toward the Final Answer.,Problem: What if neither IR-like nor IE-like methods work?Solution: semantic analysis of the text (NLP), using adequa

37、te knowledge bases that support inference (AI).,Mrs. Coolidge: “What did the preacher preach about?” Coolidge: “Sin.” Mrs. Coolidge: “What did he say?” Coolidge: “Hes against it.”,sometimes counting and templates are insufficient, and then you need to do inference to understand.,Word counting,Infere

38、nce,26,The Optimal Solution.,Combine strengths of both paradigms.use IE/NLP when you have suitable template(s), .use IR when you dontbut how exactly to do it?,27,A Summarization Machine,EXTRACTS,ABSTRACTS,?,MULTIDOCS,Extract,Abstract,Indicative,Generic,Background,Query-oriented,Just the news,10%,50%

39、,100%,Very Brief,Brief,Long,Headline,Informative,DOC,QUERY,CASE FRAMES TEMPLATES CORE CONCEPTS CORE EVENTS RELATIONSHIPS CLAUSE FRAGMENTS INDEX TERMS,28,The Modules of the Summarization Machine,E X T R A C T I O N,I N T E R P R E T A T I O N,EXTRACTS,ABSTRACTS,?,CASE FRAMES TEMPLATES CORE CONCEPTS C

40、ORE EVENTS RELATIONSHIPS CLAUSE FRAGMENTS INDEX TERMS,MULTIDOC EXTRACTS,G E N E R A T I O N,F I L T E R I N G,DOC EXTRACTS,29,Table of contents,1. Motivation. 2. Genres and types of summaries. 3. Approaches and paradigms. 4. Summarization methods (& exercise).Topic Extraction.Interpretation.Generati

41、on. 5. Evaluating summaries. 6. The future.,30,Overview of Extraction Methods,Position in the text lead method; optimal position policy title/heading method Cue phrases in sentences Word frequencies throughout the text Cohesion: links among words word co-occurrence coreference lexical chains Discour

42、se structure of the text Information Extraction: parsing and analysis,31,Note,The recall and precision figures reported here reflect the ability of various methods to match human performance on the task of identifying the sentences/clauses that are important in texts.Rely on evaluations using six co

43、rpora: (Edmundson, 68; Kupiec et al., 95; Teufel and Moens, 97; Marcu, 97; Jing et al., 98; SUMMAC, 98).,32,POSition-based method (1),Claim: Important sentences occur at the beginning (and/or end) of texts. Lead method: just take first sentence(s)! Experiments: In 85% of 200 individual paragraphs th

44、e topic sentences occurred in initial position and in 7% in final position (Baxendale, 58). Only 13% of the paragraphs of contemporary writers start with topic sentences (Donlan, 80).,33,position-Based Method (2),(Edmundson, 68) 52% recall & precision in combination with title (25% lead baseline) (K

45、upiec et al., 95) 33% recall & precision (24% lead baseline) (Teufel and Moens, 97) 32% recall and precision (28% lead baseline),(Edmundson, 68) the best individual methodKupiec et al., 95) the best individual method(Teufel and Moens, 97) increased performance by 10% when combined with the cue-based

46、 method,Individual contribution,Cumulative contribution,34,Optimum Position Policy (OPP),Claim: Important sentences are located at positions that are genre-dependent; these positions can be determined automatically through training (Lin and Hovy, 97). Corpus: 13000 newspaper articles (ZIFF corpus).

47、Step 1: For each article, determine overlap between sentences and the index terms for the article. Step 2: Determine a partial ordering over the locations where sentences containing important words occur: Optimal Position Policy (OPP),35,Opp (cont.),OPP for ZIFF corpus: (T) (P2,S1) (P3,S1) (P2,S2) (

48、P4,S1),(P5,S1),(P3,S2) (T=title; P=paragraph; S=sentence) OPP for Wall Street Journal: (T)(P1,S1).,Results: testing corpus of 2900 articles: Recall=35%, Precision=38%. Results: 10%-extracts cover 91% of the salient words.,36,Title-Based Method (1),Claim: Words in titles and headings are positively r

49、elevant to summarization. Shown to be statistically valid at 99% level of significance (Edmundson, 68). Empirically shown to be useful in summarization systems.,37,title-Based Method (2),(Edmundson, 68)40% recall & precision (25% lead baseline)(Teufel and Moens, 97) 21.7% recall & precision (28% lead baseline),(Edmundson, 68) increased performance by 8% when combined with the title- and cue-based methods. (Teufel and Moens, 97) increased performance by 3% when combined with cue-, location-, position-, and word-frequency-based methods.,

copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
备案/许可证编号:苏ICP备17064731号-1