1、 24614-1 2013 1 ISO 24614-1:2010 Language resource management Word segmentation of written texts Part 1: Basic concepts and general principles (IDT) 2461412013 II 1 , 4 2 55 , - 3 08 2013. 1386- 4 24614-1:2010 . . 1. (ISO 24614-1:2010 Language resource management Word segmentation of written texts P
2、art 1: Basic concepts and general principles). 5 1.02012 ( 8). ( 1 ) , . () . , (gost.ru) , 2014 , - 2461412013 1 . . 1. Language resource management Word segmentation of written texts Part 1: Basic concepts and general principles 20150101 1 (WSU). : - . , . , . , , , , . , , , , , , , . , , , 24614
3、, . - (CAT). , CAT. () . , , , . , . , , . (NLP) , . NLP : - , - , - , - , - . - . : - . , , . NLP , - . . , ; , . 2461412013 2 2 : 2.1 (abbreviation): , . ISO 1087-1:2000 2.2 (affix): , . - , , , , . , . 2.3 (agglutination): . ISO 24613:2008 2.4 (borrowing): , , , . 2.5 (bound morpheme): (2.18), .
4、1 : 伟 , . , , 伟大 , 伟人 , and 雄伟 . 2 : -e, to hakkyo-e ( ), . ISO 24613:2008 2.6 (compound): , . 1 - 3.10 ISO 24613:2008. 2 - , (. , ) ( ) , . . : . 2.7 (compounding): , . ISO 24613:2008 2.8 (derivation): , . ISO 24613:2008 2.9 (free morpheme): , . goodness () good , , -ness . 2.10 (homograph): , ( )
5、( ). 2.11 (inflection): , . - , . 2461412013 3 2.12 (lemma): , . find (), finds () found () finding () find. ISO 24613:2008 2.13 (lemmatization): . found find. - 2.19 ISO 1087-2:2000 3.14 ISO 30042:2008. 2.14 (lexeme): , , , . 1 - . 2 - “” ISO 24613 “ ”. 2.15 (lexicalization): , . - , , laugh (), ,
6、, apple pie ( ), , kick the bucket ( ), . 2.16 (lexicon): , , . 2.17 (morph): , . -s -s, -en, -NULL ( boys, oxen sheep), NULL . , boys : boy -s, , ox -en ox -s, . 2.18 (morpheme): , . - : . ISO 24613:2008 2.19 (multiword expression, MWE): , , - , , (, ). (MWE). ISO 24613:2008 2.20 (phrasal compound)
7、: , , apple pie , , apple () pie (), . 1 - , . 2 - . , . . 2461412013 4 2.21 (reduplication): . 2.22 (stem): , , , . ISO 24613:2008 2.23 (word): , , , . ISO 24613:2008 2.24 (word form): . find, finds, found, finding find. 2.25 (word segmentation): . 2.26 (word segmentation unit WSU): , . - , , , , , , , , H2O, , , F16. 2.27 (word structure): , . - , , , , , ( , ) . , . 2.28 (word compound): , . Hotdog ( ), ice-cream (), blackboard ( ). 3 3.1 , , , . 1 : , . . . , .