1、Tapta4IPC: helping translation of IPC definitions,Bruno Pouliquen (Bruno.Pouliquenwipo.int),25 feb 2013, IPC workshop,Translation assistant for patent titles and abstracts in PATENTSCOPE - potential use in translating IPC definitions collaboration,Introduction,Our system prepares the data for Moses,
2、 apply some post-processing (filter, pruning, binarization, optimization) and offers a Web interface to translate,Tapta framework,clean,re-clean,train-model,post-filter,prune,binarize,optimize,Publish,Introduction: Tapta,In WIPO, as part of Patentscope (English,French,German,Chinese,Japanese) eg. ht
3、tp:/patentscope.wipo.int/translate/simpleTranslate.jsf?id=JP75694586&langpair=jaen Automatic translation of a patent application only available in JapaneseIn United Nations (English from/into Arabic,French,Spanish,Russian & Chinese),Technical workflow,Bitexts aligned at sentence level,IPC context,Ga
4、ther data: Get existing definitions Add IPC schema (xml on WIPO website) Add “few” texts from patents “learn” translation model Translate new texts,Get existing data, build parallel texts,Wheel guards,WO/2013/014517 (EN) TYRE FOR VEHICLE WHEELS (FR) PNEUMATIQUE POUR ROUES DE VHICULE,IPC schema,Paten
5、t texts,Couvre-roues,Existing definitions,Bitext: training material,How well it works?,Automatic evaluation: BLEU scorePrinciple : similarity of n-grams between evaluated and reference sentences On IPC definition English-French: bleu=48% (without patent data: 44%),Good quality,needs human post-editi
6、ng,Tapta4IPC prototype (1),Live demo using: http:/patentscope.wipo.int/translateUN/translateIPC.jsf,http:/fulty3.wipo.int:8080/Wtapta/translateIPC.jsf,Tapta4IPC prototype (2),Conclusion / future work,This is a prototype, but the quality looks already acceptableHuman evaluation? Better integrate the
7、tool In PCA6TRANSDEF ? Other languages?,Tapta4IPC in various languages,Tapta4IPC should work reasonably well on the following languages (we have built some language specific tools and we have patent corpora): German Japanese Korean Spanish Dutch Portuguese Chinese RussianMore challenging: Czech, Slo
8、vak, Polish (many word forms, training corpus?) Estonian (even more word forms, would in theory require more training corpus)Other languages: Arabic, Italian, Danish, Swedish etc.,Thank you for your attention, Merci pour votre attention! 感谢您的关注 Grazie per la vostra attenzione! Gracias por su atencin ! Vielen Dank fr Ihre Aufmerksamkeit! Obrigado pela vossa ateno! Dzikuj bardzo za Pastwa uwag! Dkujeme za Vai pozornost! akujem ti vemi pekne za tvoju pozornos Tnan thelepanu eest! ! Tak for Jeres opmrksomhed! Thank you for your attention!,