$Account.OrganizationName
September 2009, Volume 1: Issue 4
In This Issue  

Join our list  
Join our mailing list!

Dear Reader,

As medical record systems are changing with health care reform, we are witnessing the evolution not only for the nation as a whole, but in the trenches of each health provider and facility. For many, the change is daunting.

In the past, charts were handwritten which gave way to handwritten templates. Now, one has a choice to make- traditional dictation services or computer-generated voice recognition systems...or some combination of both.

Understandably, the physician or provider does not want to rush headlong into a costly mistake.

Therefore, this issue is dedicated to the comparison and discussion of the two followed by an analysis that may help make an appropriate decision for each one of us a little easier.

Transcription versus Speech Recognition Programs
 
Slaying the Dragon

The following article on Transcription versus Speech Recognition was submitted by a guest author, Andy Braverman, who specializes in Transcription.

Mentioning "Transcription versus Speech Recognition" sounds like it will be the next Heavyweight Boxing Match at Madison Square Garden.

But in fact, Speech Recognition does not work without Transcription!


Speech Recognition (hereafter referred to as "SR") is a process whereby a computer program performs that same task. Instead of "listening" to the audio dictation, SR exracts mathematical (frequency and amplitude) characteristics for each spoken word and then chooses the word from its vocabulary list that most closely matches those characteristics.

In fact, most advanced SR systems do their best-fit word matching not simply by looking at one word at a time, but by looking at the words that surround each word...this is called "context modeling or language modeling". And to further enhance their accuracy, SR systems usually limit their vocabulary list to the words that are expected to be spoken in a particular application (like radiology versus pathology in a medical application, or like criminal versus civil law in a legal application)...this pared-down vocabulary list is called their "lexicon". And one more thing that SR systems do to further enhance their accuracy is to use a "voice model", that is, to take into consideration an individual's unique pronunciation.

Long ago, a few decades ago in fact, SR started off being "Speaker Dependent". That meant that before someone could expect reasonable results from SR, they first had to spend an hour or more reading a specially prepared script...that was called "training". A user would have to carefully read that script, sometimes more than once, in order for the SR system to build the individual's "voice model". That script was specially crafted to make the user say words that contained the few hundred unique sounds (utterances)that allowed the SR system to best understand that person's unique pronunciation.

But it was quickly realized that no SR user, especially a busy professional, wanted to spend the time necessary to train the system to his or her voice. It was a very tedious and time consuming prerequisite that had to be done before the SR system could be used. So to overcome this problem, SR systems started to tout themselves as "Speaker Independent". They didn't accomplish this through any new technical breakthrough, but by moving the "training" to a background task. How, you may ask did they accomplish this? Well, simply by sacrificing accuracy for a few weeks while using a "feedback loop" (which consisted of a Transcriptionist correcting the SR output) to build the individuals "voice model" over time. Instead of an hour or so of "training" so that the SR system could learn a user's pronunciation, the SR system learned over weeks of best-guessing what the user said, and then seeing what the user actually said by noting the corrections made by the Transcriptionist.

Now we come to a very important thing you should know about what accuracy rates you can expect from SR systems and the reason why SR alone does not work without the support of Transcription.

As we've all seen, in the futuristic world of Star Trek, the Enterprise Computer would never misinterpret anything anyone said to it, unlike the famous Microsoft Bill Gate's example of an errant SR system interpreting someone saying "Recognize Speech" for "Wreck a Nice Beach". But in fact, that's just the kind of mistake that even the most sophisticated SR products continue to struggle with today.

SR accuracy rates vary from product to product, but typically start around 80% and increase over usage to around 90%. Some SR manufacturers may say that their system is even more accurate, but they all will agree, even if they won't admit it, that their SR systems are not 100% accurate and will never...its worth repeating...will NEVER be!

And that's the reason why SR alone does not work without Transcription..because SR isn't 100% accurate, which therefore requires a Transcriptionist to correct its output!

The SR system's feedback loop is called by some manufacturers the "Correction Editor". This sounds like it is a software utility, but in fact is a skilled typist...the Transcriptionist. A Transcriptionist is the person that fixes the output of the SR system. After the SR system does the best job it can of typing what it thinks it heard...because we know there may be one or more wrong words in each document, the Transcriptionist must listen to the entire audio dictation while visually proofing the SR-typed document in order to type-over and correct the wrong words. It's this correction that the Transcriptionist makes that is the feedback loop that helps the SR system to "learn" how each individual pronounces words. While that feedback loop helps improve the SR accuracy from around 80% to the low to mid 90%...that is pretty much the best overall accuracy that can be expected.

Even if SR systems were 99% accurate, that would mean that for every 100 spoken words (which is barely a paragraph or two) one word will be wrong! If, for example, an SR system misses the prefix "non" in non-malignant...without the Transcriptionist being in the loop to correct that mistake, that simple mistake would make for a very bad day for the patient, the doctor, and the hospital.

Over the decades I've designed dictation products and SR systems, and have sat in many "Transcription Solution" meetings where a hospital's Transcription Department feared that SR technology was going to put them out of business...that it was going to replace them with computers that have no need for food, rest, or a paycheck. But in each case, I've never seen that happen. That is because the hospital still needs their transcriptionists to proof and fix the output of the SR system.

Every SR Transcriptionist I've met has said "...while I sit at the keyboard, with my fingers-at-the-ready, ready to correct the SR system's mistakes, I could have just as quickly and easily typed the entire document myself!"

And that brings into focus a very important consideration when debating whether to implement an SR system...and that is to consider if it will be cost-effective to implement an SR system for the transcription of dictation.

From what I've described thus far, I think you'll agree that in most cases the answer will be "no". In most dictation applications, it is not cost effective to implement an SR system...because you still need your transcription staff to correct the SR system's mistakes.

Ahhh, but you say, they only have to type a few words instead of the whole document...that is true, but they still have to listen to the entire document...and since they can type as fast as they listen, they could have simply typed the whole thing themselves. In fact by implementing an SR system, you've just added the (usually high)cost of the SR system to the transcription costs you already have. And with the same number of transcriptionists listening to all of your dictations, you have no net-gain in efficiency brought about by the use of the (imperfect)SR system.

Some SR sales people will say, "..but you can get rid of your skilled MTs (medical transcriptionists) and replace them with lower paid, lower skilled typists. But that doesn't work in reality. An MT is an MT because they are keenly familiar with the medical terminology being used. A lower skilled typist will spend many more times the time of a skilled MT in proofing work, as well as constantly interrupting other typists to ask them to listen to help them to determine what a doctor has said. Instead of efficiently proofing and correcting documents, a lower skilled typist's head will be buried in the PDR constantly trying to look-up which word they thought the doctor said. If you're willing to and can afford to let them struggle through a few years of hard earned exkperience, they ight actualy end up becoming reasonably proficient MTs after a very long period of apprenticeship.

And just about every SR sales person will say, as they have for the two decades I've been involved with this discussion of "transcription versus Speech Recognition" is, "just one more generation of faster computer and SR will become the Holy Grail." As described earlier, SR is a very complicated process of analyzing the audio file, which results at best in an imperfect best-guess of what was said.

To put the complexity of the SR task into perspective, you just have to look at the size of a typical SR system's lexicon (the vocabulary list of words for a particular application). In a very effective yet limited application of SR, such as in an Airline Reservation System, the lexicon contains only about 100 words. A few words like Flight number, arrival, departure, the names of airlines and cities, and the numbers 0 through 9 are sufficient for a person to ask an SR based Airline Reservation System if their flight will depart on time. But in an application like Radiology, or Pathology, or in a legal or business application, the typical lexicon can contain 20,000 to 40,000 words! And forget the ubiquitous "talking typewriter" application that we all wish we had...where you can speak on any subject not restricted to a particular application and have the machine type a perfect document for you...because for a "Conversational English" application, the size of the lexicon grows to 1,000,000 words and beyond!

With that simple comparison of the size of the lexion required for particular applications, it becomes evident why it's such a daunting task, and why SRs typed results are less than perfect.

As a design engineer and one who has spent nearly two decades in the dictation and transcription industry, I am a fan of SR technology. But I am also a practical person when it comes to product marketing. As such, I have found that today's SR technology, and what we can expect it to become in the future, falls short of being a cost effective technology to implement in a typical dictation environment.

Because accuracy expectation is less than 100%, and because an SR implementation cannot completely take the human element (the Transcriptionist) out of the loop...this shows that today's SR technology is generally not a more cost effective solution (to turning dictation into documents) versus simply using traditional Transcription techniques alone.

___________________________________________
Mini Bio: Andy Braverman is the President and Owner of Apptec Corporation. Andy has been involved in the design and marketing of dictation and transcription systems for nearly two decades...one decade of which was devoted to designing next generationdictation and transcription products for Phillips Speech Processing of Vienna, Austria. In the past decade, Andy has devoted his talents to bringing to market feature rich and cost effective dictation and transcription products for medical,legal, and general business applications. His company, Apptec Corporation based on Long Island, is also involved in developing custom products to suit their client's specific needs, from software development to circuit design. If you have a question for Andy, or a problem that needs solving, he invites you to contact him at 1-631-828-1245 or at Andy@DigiTelStore.com. See his latest adventures in the field of Speech Processing at www.DigiTelStore.com


Take a Dragon to Your Office
 
Easy to Tame

Dragon Medical Software, the most widely incorporated speech recognition system in medicine today, is currently used by more than 50,000 health care providers in the United States, for charting as part of the electronic medical record reform strategy.

Nuance, the makers of Dragon Medical claim, "that the system is up to 99% accurate out-of-the-box" including medical lexicon for 80 specialties and subspecialties. This translates to five errors within a two-page document. Though the software does learn from its errors and its accuracy increases over time.

Used in conjunction with appropriate macros to re-use often dictated text, Dragon Medical hastens the dictation process, saving time per patient as providers navigate through their electronic medical records (EMR). Judiciously using customized macros (templates) replaces 500 word dictations with only 200 words, thereby reducing misrecognitions by three.

Dragon Medical works with the applications used by most people, including AOL, Microsoft Word, and Internet Explorer. It supports Mozilla Firefox and Thunderbird as well.

Dragon may be used with handheld digital recorders but also supports cordless or array microphones, which are usually included with ordered packages. With the Macros, text and graphic dictation shortcuts are possible.

Custom vocabularies may be formulated and the system can be formatted and edited with voice commands.

The price of obtaining a license for Dragon Medical is based per physician and the range varies (depending on how many providers you are purchasing the license for, between $1199.00 to 1039.99 (for over 625 physicians)


How the Comparison Relates to You
 
Deep Probe into Analysis between 2 EMR Systems

The most obvious problem with voice recognition systems, is that each user must have a training session so that the voice pattern can be recognized by the program. The average training time takes approximately 30 minutes between dictation time and time for the computer to process the speech.

VR systems are highly impractical in settings where there are many "transient" providers as in a large department with rotating interns or residents or with many physicians that are on staff at many hospitals.

In addition to the training for each physician, the system requires purchase separately for the license of each physician.

A place where VR works quite well is in a small controlled department setting such as radiology or emergency rooms since the area is usually confined to a small number of physicians and the terminology tends to be slightly more limited.

Another problem with voice recognition charting can be in the use of templates. Although usage of templates saves time, it is not without its own unique problems.

Typically templates are used for:
*history of present illness
*past medical history
*past surgical history
*allergies
*family history
*medications
*social history
*chief complaint
*physician findings on examination

Any part of the examination that differs from the template is either eliminated or dictated over the default with inserted macros.

But what happens when the physician is in a rush to finish the chart between patients? Glaring errors have been seen where the chart will read healthy young male when the gynecologic exam was performed on a woman or a complete examination is recorded when only one system was addressed in a patient coming for a follow-up. So when using templates, it is imperative to ensure that not only the right template is used but that the exam results and the actual exam, match.

Checking for accuracy in VR systems is done immediately at the time of dictation. This being said, it is clear that the health care provider must be visualizing the computer monitor and making the corrections directly. This translates into slightly more time than merely dictating and sending it off to a transcriber.

On the other hand, once completed, the chart is also completed without waiting for records to return. Turn around time for dictation usually varies anywhere between 39 minutes to 48 hours.

So if it is an office where there is time between patients, a small more controlled setting or a site which has a few constant providers, VR is very manageable.

If your work setting is a large facility, you are constantly on the go, and have no time between patients, relying on a medical transcriptionist to transform your dictations while you are on the go, will relieve you of the additional time spent at the computer screen.

In terms of cost, there is no competition between the two if Voice Recognition systems are practical in your setting.

For instance: The Emergency Department of Massachusetts General Hospital figured that it cost approximately $7.50 per chart using a transcription service. With the number of patients seen annually, assuming that each one would be dictated, the transcription cost was projected at $337,500 per year. The initial cost for Dragon was $3,000 for a savings of $334,500 in the first year.

Fortunately, this is not an all or none decision, just as there is seldom a totally ideal setting. There is no reason why there can not be a combination of both systems to your practice while you are getting comfortable with the new requirements.


Subscribers and Business Friends
 

For my subscribers,colleagues and friends: You can copy any content in this newsletter for your own use as long as the following accompanies it and the link is live:
Reprinted by permission of Internet copywriter Barbara Hales. For more information on innovations and tips, subscribe to the Medical Strategist at:
http://www.TheWriteTreatment.com

If you would like to contribute your news about a product or event as well as your thoughts and comments, please email me at: Barbara@TheWriteTreatment.com.

Send me the lead of your website article and your URL. It may be published here so that your colleagues can link to the "whole story".



The Medical Strategist was founded in 2009 with the following established goals:
*Help guide you into a plan of action for your business
*Keep you in the loop on changes within the healthcare field and how it impacts your practice
*Deliver pertinent information and new regulations directly affecting you, the practitioner
*Identify barriers and how to navigate around them
*Act as your liaison between you the provider, IT companies, pharmaceutical companies and governmental agencies

For Your Health and Wealth,


Barbara Hales
The Write Treatment

Phone: 516-647-3002