1、Designation: E2364 04 (Reapproved 2010)Standard Guide toSpeech Recognition Technology Products in Health Care1This standard is issued under the fixed designation E2364; the number immediately following the designation indicates the year oforiginal adoption or, in the case of revision, the year of la
2、st revision. A number in parentheses indicates the year of last reapproval. Asuperscript epsilon () indicates an editorial change since the last revision or reapproval.1. Scope1.1 This guide identifies system types and describes variousfeatures of speech recognition technology (SRT) products usedto
3、create the healthcare record. This will assist users (healthinformation professionals, medical report originators, admin-istrators, medical transcriptionists, speech recognition medicaltranscription editors (SRMTEs), system integrators, supportpersonnel, trainers, and others) to make informed decisi
4、onsrelating to the design and utilization of SRT systems.1.2 This guide does not address the following items:1.2.1 System and data (voice and text) security.1.2.2 Administrative processes such as authentication of thedocument, productivity measurements, etc.2. Referenced Documents2.1 ASTM Standards:
5、2E1902 Specification for Management of the Confidentialityand Security of Dictation, Transcription, and TranscribedHealth RecordsE1985 Guide for User Authentication and AuthorizationE2084 Specification for Authentication of Healthcare Infor-mation Using Digital Signatures3E2184 Specification for Hea
6、lthcare Document FormatsE2185 Specification for Transferring Digital Voice DataBetween Independent Digital Dictation Systems and Work-stations3E2344 Guide for Data Capture through the Dictation Pro-cess2.2 Other Documents:Resource Interchange File Format (RIFF) Standard3. Terminology3.1 Definitions:
7、3.1.1 acoustic model, nphoneme map of user.3.1.2 authentication, nthe process of confirming author-ship of an entry or of a document, for example, by verifyingwith a written signature, identifiable initials, computer key, orother methods.3.1.3 author, nperson responsible for content of text file.3.1
8、.4 back-end system, ndelayed processing for documentcompletion.3.1.5 compound file, na file containing recorded voicewith its transcribed text.3.1.6 context, na long list of vocabulary words andphrases used for the particular subject matter, with theirspellings and pronunciations, statistical inform
9、ation aboutusage of each word alone and in combination. For example, thecontext may include the number of times that “right,”“Wright,” “turn right,” “right turn,” “right hand,” and “Mr.Wright” occur in a body of text. It also includes grammar andstyle information. Language model, lexicon, topic, and
10、 vocabu-lary are terms that are all used synonymously with context.3.1.7 digital signature, ndata associated with, or a cryp-tographic transformation of, a data unit that allows a recipientto prove the source and integrity of the data unit and protectagainst forgery, for example, by the recipient.3.
11、1.8 edit, vto review the document while listening to theoriginators recorded voice and reading the associated tran-scribed text (compound file), checking for recognition errorsand correcting document formatting and other inconsistencies.When the SRMTE is not the originator, the SRMTE may needto flag
12、 the document for originator/author clarification ofunclear content or intent.3.1.9 encryption, nthe process of transforming plain text(readable) into cipher text (unreadable) for the purpose ofsecurity and privacy.3.1.10 front-end system, na system incorporating real-time recognition and may includ
13、e real-time self-editing by theoriginator.1This guide is under the jurisdiction of ASTM Committee E31 on HealthcareInformatics and is the direct responsibility of Subcommittee E31.15 on HealthcareInformation Capture and Documentation.Current edition approved March 1, 2010. Published August 2010. Ori
14、ginallyapproved in 2004. Last previous edition approved in 2004 as E236404. DOI:10.1520/E2364-04R10.2For referenced ASTM standards, visit the ASTM website, www.astm.org, orcontact ASTM Customer Service at serviceastm.org. For Annual Book of ASTMStandards volume information, refer to the standards Do
15、cument Summary page onthe ASTM website.3Withdrawn. The last approved version of this historical standard is referencedon www.astm.org.1Copyright ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959, United States.3.1.11 language model, ncontext specific to medical
16、 spe-cialty, user, or practice setting.3.1.12 lossless compression, na lossless compression re-duces the amount of data required to represent the originalvoice file but has no impact on sound quality. The original filecan be replicated precisely at any time.3.1.13 lossy compression, na lossy compres
17、sion losessome information, resulting in degradation of the sound qualityinherent in the original voice file and an inability to preciselyregenerate that original file.3.1.14 microphone, nan instrument whereby sound wavesare caused to generate or modulate an electric current usuallyfor the purpose o
18、f transmitting or recording sound (as speech ormusic).3.1.15 microphone element, nthe component within themicrophone that does the actual conversion from sound wavesto electrical signals.3.1.16 natural language processing, nmethod used inartificial intelligence to process and derive interpretation o
19、fhuman language.3.1.17 networked system, nsystem connected to a net-work.3.1.18 “normal” dictation, nroutine phrases or para-graphs.3.1.19 originator, nperson who provides oral input ordictation, not necessarily the person responsible for the con-tent.3.1.20 phoneme, nsmallest unit of sound in a spo
20、kenlanguage.3.1.21 prompts, nreminders provided in order to completea task.3.1.22 real-time recognition, nsimultaneous speech-to-text transcription.3.1.23 RecOspeech recognition error3.1.24 RIFF file, nResource Interchange File Format(RIFF) is self-descriptive; that is, the voice file format isdefin
21、ed within the file.3.1.25 speech recognition, ncomputerized transcriptionof speech to text.3.1.26 speech recognition medical transcription editor,nmedical transcriptionist who edits compound files and/orthe SRT language model.3.1.27 SRT engine, nspeech recognition processor.3.1.28 standalone system,
22、 nsystem not connected to anetwork.3.1.29 synchronization, vhaving voice and text matchedsuch as in a point-and-play manner.3.1.30 text file, na file that contains text message.3.1.31 voice enrollment, nthe process whereby a userreads aloud selected text so the SRT software can map orrecord the user
23、s speech sound pattern (phonemes).3.1.32 voice file, ndigitalized audio message representingvoice input.3.1.33 voice macros, nstored keystrokes that are activatedby a voice command.3.1.34 WAV, nvoice file format.3.2 Acronyms:3.2.1 MTmedical transcriptionist3.2.2 SRMTEspeech recognition medical trans
24、criptioneditor3.2.3 RIFFresource interchange file format3.2.4 SRTspeech recognition technology4. Significance and Use4.1 This guide is intended to provide general guidelinestoward the design and utilization of SRT products used forhealthcare documentation. It is intended to recommend theessential el
25、ements required of SRT systems in healthcare.4.2 This guide will not identify specific products or makerecommendations regarding specific vendors or their productsor services.4.3 A well-edited SRT document may result in improvedquality over current methods of documentation, that is, hand-written not
26、es and improved productivity over traditional dic-tation and transcription.4.3.1 Faster turnaround times.4.3.2 Legible documentation over handwriting has manyadvantages:4.3.2.1 Improved patient care communication.4.3.2.2 Enhanced patient safety.4.3.2.3 Reduced malpractice risks.4.3.2.4 Facilitation
27、of appropriate reimbursement.4.3.3 For the medical transcriptionist and/or SRMTE, de-creased repetitive stress injuries, such as neck, arm, wrist, andheel pain.4.3.4 Facilitation of cost controls related to documentcompletion.4.3.5 Better utilization of medical language skills of MTs asproductivity
28、is not limited by keyboarding skills.5. Speech Recognition Technology Systems5.1 Speech recognition technology (SRT) is designed tocapture voice and transcribe that speech into text. This can bedone by a single user working at a standalone computer or bya large group of users working on a network. A
29、nother methodis processing a pre-recorded digital voice file through an SRTsystem, with the resulting text and/or SRT engine being editedby the MTE.5.2 Speech recognition technology system workflow.5.2.1 Front-end speech recognition process involves:5.2.1.1 Recording the voice.5.2.1.2 SRT transcript
30、ion of the voice file to text.5.2.1.3 Editing may be done by the originator and/orSRMTE.5.2.1.4 Compound file may be saved as an option.5.2.1.5 Text file can be printed, archived, transmitted, orintegrated into an electronic health record.5.2.1.6 Update the SRT context for RecOs and new termi-nology
31、.5.2.2 Back-end speech recognition process involves:5.2.2.1 Recording the voice.5.2.2.2 Transmitting the voice file to the speech recognitionengine.5.2.2.3 SRT transcription of the voice file to text.5.2.2.4 Saving the voice and text as a compound file.5.2.2.5 Routing the compound file to the SRMTE.
32、5.2.2.6 Editing done by the SRMTE.E2364 04 (2010)25.2.2.7 Saving the text file.5.2.2.8 Returning the edited text file to the originator forauthentication.5.2.2.9 SRMTE updates the SRT context for RecOs andnew terminology.5.2.3 Standalone SRT System:5.2.3.1 Only one person at a time can use a standal
33、onesystem.5.2.3.2 Context is limited by the hard drive space.5.2.3.3 Editing is done locally, at the point of input, eitherby the originator or by the SRMTE.5.2.3.4 Input devices.(1) Noise-canceling SRT microphones.(2) Handheld digital recorders.(3) Digital dictation systems.(4) Telephones.5.2.3.5 T
34、he following scenarios are offered to give thereader examples of how these systems work. They are notintended to represent every possible scenario for these systems.(1) A radiologist (originator) dictates into a microphoneconnected to a personal computer running an SRT program.The voice is translate
35、d to text in real time. The originator editsthe text and/or the SRT context.(2) A family practitioner dictates into a personal computerthroughout the day. Each compound file is saved and then,using the same computer, the SRMTE edits the text, listeningto the recorded voice as necessary for clarifica
36、tion. TheSRMTE may also be responsible for editing the SRT context.(3) A group of cardiologists dictate into handheld digitalrecording devices throughout the day. The voice files aretransmitted from the recorders to a computer and recognized bythe SRT engine, using the cardiology context and each ph
37、ysi-cians acoustic model. Once recognized, each text file is editedby the SRMTE. The SRMTE may also be responsible forediting the SRT context.5.2.4 Networked SRT System:5.2.4.1 On a networked system, all files containing recordeddictation (voice files) are transmitted to a server, where the filesare
38、 queued up for recognition. The compound files are thenrouted to the SRMTE for editing.5.2.4.2 A networked system is designed to allow multipleoriginators and SRMTEs to work simultaneously. The voicefiles are recognized on a server or at the workstation(s) and theresulting compound files are routed
39、to the SRMTE for editing.5.2.4.3 Contexts.(1) The networked system may be programmed for a singlemedical specialty or subspecialty, such as radiology, pathology,family practice, physical therapy, or emergency medicine.(2) A networked system may also be programmed withmany contexts or language models
40、 so originators from manydifferent medical specialties can use it to improve speechrecognition accuracy.5.2.4.4 Editing may be done in the same facility, or thecompound files may be sent to a remote SRMTE.5.2.4.5 Input devices.(1) Noise-canceling SRT microphones.(2) Handheld digital recorders.(3) Di
41、gital dictation systems.(4) Telephone.5.2.4.6 The following scenarios are offered to give thereader examples of how these systems work. They are notintended to represent every possible scenario for these systems.(1) Six radiologists simultaneously dictate at individualworkstations. Each voice file i
42、s routed to a recognition server,or the processing may take place on each workstation, withinformation regarding the originators specialty and identifica-tion, allowing the recognition server to load the correspondingacoustic model and context. The voice file is processed by theSRT engine and the re
43、sulting compound file (voice and textfiles) is routed to the SRMTE for editing. The SRMTE mayalso be responsible for editing the SRT context.(2) A hospital has 300 healthcare providers dictating intoportable handheld digital recording devices from the hospitaland several remote satellite clinics, or
44、 dictation may take placeon individual workstations. The voice files are encrypted andsecurely transmitted to the digital dictation system of acontracted transcription company. Each voice file is routed to arecognition server, or the processing may take place onworkstations, with information regardi
45、ng the originators spe-cialty and identification, allowing the recognition server to loadthe corresponding acoustic model and context. The voice file isprocessed by the SRT engine and the resulting compound file(voice and text files) is routed to the SRMTE for editing.SRMTEs working both in the offi
46、ce and remotely receiverecognized compound files via encrypted Internet transmis-sions. The editing is performed on standalone computers andthe encrypted text files are returned. The SRMTE may also beresponsible for editing the SRT context.6. Training6.1 Originators:6.1.1 Voice enrollment and proper
47、 position of microphoneand proper placement of microphone element.6.1.2 Build customized language model.6.1.3 Build “normal” dictations per user.6.1.4 Develop skill sets.6.1.4.1 Proper correction technique for a RecO.6.1.4.2 Navigation/mobility skills for moving around in thedocument.6.1.4.3 Editing
48、 skills specific to SRT products.6.1.4.4 Editing language model.6.2 Speech Recognition Medical Transcription Editor:6.2.1 Voice enrollment and proper position of microphoneand proper placement of microphone element.6.2.2 Build customized language model.6.2.3 Build “normal” dictations per user.6.2.4
49、Develop skill sets.6.2.4.1 Proper correction technique for a RecO.6.2.4.2 Navigation/mobility skills for moving around in thedocument.6.2.4.3 Editing skills specific to SRT products.6.2.4.4 Editing language model.6.2.4.5 Start and stop audio file.6.2.4.6 Identify a RecO.E2364 04 (2010)37. Realities of Speech Recognition Technology7.1 Originators with good dictation habits will more likelybe successful using SRT. See Guide E2344.7.2 Originators with exceptionally heavy guttural accentsmay have more challenges. However, speakers of English a