|
|
|
September 2009, Volume 1: Issue 4 |
|
Dear Reader,
As medical record systems
are changing with health care reform, we are
witnessing the evolution not only for the
nation as a whole, but in the trenches of
each health provider and facility. For many,
the change is daunting.
In the past, charts were handwritten which
gave way to handwritten templates. Now, one
has a choice to make- traditional dictation
services or computer-generated voice
recognition systems...or some combination of
both.
Understandably, the physician or provider
does not want to rush headlong into a costly
mistake.
Therefore, this issue is dedicated to the
comparison and discussion of the two
followed by an analysis that may help make
an appropriate decision for each one of us a
little easier.
|
Transcription versus Speech
Recognition Programs
|
|
Slaying the Dragon
The following
article on Transcription versus
Speech Recognition was submitted by
a guest author, Andy Braverman, who
specializes in Transcription.
Mentioning "Transcription versus
Speech Recognition" sounds like it
will be the next Heavyweight Boxing
Match at Madison Square Garden.
But in fact, Speech Recognition does
not work without Transcription!
Speech Recognition (hereafter
referred to as "SR") is a process
whereby a computer program performs
that same task. Instead of
"listening" to the audio dictation,
SR exracts mathematical (frequency
and amplitude) characteristics for
each spoken word and then chooses
the word from its vocabulary list
that most closely matches those
characteristics.
In fact, most advanced SR systems do
their best-fit word matching not
simply by looking at one word at a
time, but by looking at the words
that surround each word...this is
called "context modeling or language
modeling". And to further enhance
their accuracy, SR systems usually
limit their vocabulary list to the
words that are expected to be spoken
in a particular application (like
radiology versus pathology in a
medical application, or like
criminal versus civil law in a legal
application)...this pared-down
vocabulary list is called their
"lexicon". And one more thing that
SR systems do to further enhance
their accuracy is to use a "voice
model", that is, to take into
consideration an individual's unique
pronunciation.
Long ago, a few decades ago in fact,
SR started off being "Speaker
Dependent". That meant that before
someone could expect reasonable
results from SR, they first had to
spend an hour or more reading a
specially prepared script...that was
called "training". A user would have
to carefully read that script,
sometimes more than once, in order
for the SR system to build the
individual's "voice model". That
script was specially crafted to make
the user say words that contained
the few hundred unique sounds
(utterances)that allowed the SR
system to best understand that
person's unique pronunciation.
But it was quickly realized that no
SR user, especially a busy
professional, wanted to spend the
time necessary to train the system
to his or her voice. It was a very
tedious and time consuming
prerequisite that had to be done
before the SR system could be used.
So to overcome this problem, SR
systems started to tout themselves
as "Speaker Independent". They
didn't accomplish this through any
new technical breakthrough, but by
moving the "training" to a
background task. How, you may ask
did they accomplish this? Well,
simply by sacrificing accuracy for a
few weeks while using a "feedback
loop" (which consisted of a
Transcriptionist correcting the SR
output) to build the individuals
"voice model" over time. Instead of
an hour or so of "training" so that
the SR system could learn a user's
pronunciation, the SR system learned
over weeks of best-guessing what the
user said, and then seeing what the
user actually said by noting the
corrections made by the
Transcriptionist.
Now we come to a very important
thing you should know about what
accuracy rates you can expect from
SR systems and the reason why SR
alone does not work without the
support of Transcription.
As we've all seen, in the futuristic
world of Star Trek, the Enterprise
Computer would never misinterpret
anything anyone said to it, unlike
the famous Microsoft Bill Gate's
example of an errant SR system
interpreting someone saying
"Recognize Speech" for "Wreck a Nice
Beach". But in fact, that's just the
kind of mistake that even the most
sophisticated SR products continue
to struggle with today.
SR accuracy rates vary from product
to product, but typically start
around 80% and increase over usage
to around 90%. Some SR manufacturers
may say that their system is even
more accurate, but they all will
agree, even if they won't admit it,
that their SR systems are not 100%
accurate and will never...its worth
repeating...will NEVER be!
And that's the reason why SR
alone does not work without
Transcription..because SR isn't 100%
accurate, which therefore requires a
Transcriptionist to correct its
output!
The SR system's feedback loop is
called by some manufacturers the
"Correction Editor". This sounds
like it is a software utility, but
in fact is a skilled typist...the
Transcriptionist. A Transcriptionist
is the person that fixes the output
of the SR system. After the SR
system does the best job it can of
typing what it thinks it
heard...because we know there may be
one or more wrong words in each
document, the Transcriptionist must
listen to the entire audio dictation
while visually proofing the SR-typed
document in order to type-over and
correct the wrong words. It's this
correction that the Transcriptionist
makes that is the feedback loop that
helps the SR system to "learn" how
each individual pronounces words.
While that feedback loop helps
improve the SR accuracy from around
80% to the low to mid 90%...that is
pretty much the best overall
accuracy that can be expected.
Even if SR systems were 99%
accurate, that would mean that for
every 100 spoken words (which is
barely a paragraph or two) one word
will be wrong! If, for example,
an SR system misses the prefix "non"
in non-malignant...without the
Transcriptionist being in the loop
to correct that mistake, that simple
mistake would make for a very bad
day for the patient, the doctor, and
the hospital.
Over the decades I've designed
dictation products and SR systems,
and have sat in many "Transcription
Solution" meetings where a
hospital's Transcription Department
feared that SR technology was going
to put them out of business...that
it was going to replace them with
computers that have no need for
food, rest, or a paycheck. But in
each case, I've never seen that
happen. That is because the hospital
still needs their transcriptionists
to proof and fix the output of the
SR system.
Every SR Transcriptionist I've
met has said "...while I sit at the
keyboard, with my
fingers-at-the-ready, ready to
correct the SR system's mistakes, I
could have just as quickly and
easily typed the entire document
myself!"
And that brings into focus a very
important consideration when
debating whether to implement an SR
system...and that is to consider if
it will be cost-effective to
implement an SR system for the
transcription of dictation.
From what I've described thus far, I
think you'll agree that in most
cases the answer will be "no". In
most dictation applications, it is
not cost effective to implement an
SR system...because you still need
your transcription staff to correct
the SR system's mistakes.
Ahhh, but you say, they only have to
type a few words instead of the
whole document...that is true, but
they still have to listen to the
entire document...and since they can
type as fast as they listen, they
could have simply typed the whole
thing themselves. In fact by
implementing an SR system, you've
just added the (usually high)cost of
the SR system to the transcription
costs you already have. And with the
same number of transcriptionists
listening to all of your dictations,
you have no net-gain in efficiency
brought about by the use of the
(imperfect)SR system.
Some SR sales people will say,
"..but you can get rid of your
skilled MTs (medical
transcriptionists) and replace them
with lower paid, lower skilled
typists. But that doesn't work in
reality. An MT is an MT because they
are keenly familiar with the medical
terminology being used. A lower
skilled typist will spend many more
times the time of a skilled MT in
proofing work, as well as constantly
interrupting other typists to ask
them to listen to help them to
determine what a doctor has said.
Instead of efficiently proofing and
correcting documents, a lower
skilled typist's head will be buried
in the PDR constantly trying to
look-up which word they thought the
doctor said. If you're willing to
and can afford to let them struggle
through a few years of hard earned
exkperience, they ight actualy end
up becoming reasonably proficient
MTs after a very long period of
apprenticeship.
And just about every SR sales person
will say, as they have for the two
decades I've been involved with this
discussion of "transcription versus
Speech Recognition" is, "just one
more generation of faster computer
and SR will become the Holy Grail."
As described earlier, SR is a very
complicated process of analyzing the
audio file, which results at best in
an imperfect best-guess of what was
said.
To put the complexity of the SR task
into perspective, you just have to
look at the size of a typical SR
system's lexicon (the vocabulary
list of words for a particular
application). In a very effective
yet limited application of SR, such
as in an Airline Reservation System,
the lexicon contains only about 100
words. A few words like Flight
number, arrival, departure, the
names of airlines and cities, and
the numbers 0 through 9 are
sufficient for a person to ask an SR
based Airline Reservation System if
their flight will depart on time.
But in an application like
Radiology, or Pathology, or in a
legal or business application, the
typical lexicon can contain 20,000
to 40,000 words! And forget the
ubiquitous "talking typewriter"
application that we all wish we
had...where you can speak on any
subject not restricted to a
particular application and have the
machine type a perfect document for
you...because for a
"Conversational English"
application, the size of the lexicon
grows to 1,000,000 words and beyond!
With that simple comparison of
the size of the lexion required for
particular applications, it becomes
evident why it's such a daunting
task, and why SRs typed results are
less than perfect.
As a design engineer and one who has
spent nearly two decades in the
dictation and transcription
industry, I am a fan of SR
technology. But I am also a
practical person when it comes to
product marketing. As such, I
have found that today's SR
technology, and what we can expect
it to become in the future, falls
short of being a cost effective
technology to implement in a typical
dictation environment.
Because accuracy expectation is less
than 100%, and because an SR
implementation cannot completely
take the human element (the
Transcriptionist) out of the
loop...this shows that today's SR
technology is generally not a more
cost effective solution (to turning
dictation into documents) versus
simply using traditional
Transcription techniques alone.
___________________________________________
Mini Bio: Andy Braverman is
the President and Owner of Apptec
Corporation. Andy has been involved
in the design and marketing of
dictation and transcription systems
for nearly two decades...one decade
of which was devoted to designing
next generationdictation and
transcription products for Phillips
Speech Processing of Vienna,
Austria. In the past decade, Andy
has devoted his talents to bringing
to market feature rich and cost
effective dictation and
transcription products for
medical,legal, and general business
applications. His company, Apptec
Corporation based on Long Island, is
also involved in developing custom
products to suit their client's
specific needs, from software
development to circuit design. If
you have a question for Andy, or a
problem that needs solving, he
invites you to contact him at
1-631-828-1245 or at
Andy@DigiTelStore.com. See his
latest adventures in the field of
Speech Processing at
www.DigiTelStore.com
|
|
Take a Dragon to Your Office
|
|
Easy to Tame
Dragon Medical
Software, the most widely
incorporated speech recognition
system in medicine today, is
currently used by more than 50,000
health care providers in the United
States, for charting as part of the
electronic medical record reform
strategy.
Nuance, the makers of Dragon Medical
claim, "that the system is up to 99%
accurate out-of-the-box" including
medical lexicon for 80 specialties
and subspecialties. This translates
to five errors within a two-page
document. Though the software does
learn from its errors and its
accuracy increases over time.
Used in conjunction with appropriate
macros to re-use often dictated
text, Dragon Medical hastens the
dictation process, saving time per
patient as providers navigate
through their electronic medical
records (EMR). Judiciously using
customized macros (templates)
replaces 500 word dictations with
only 200 words, thereby reducing
misrecognitions by three.
Dragon Medical works with the
applications used by most people,
including AOL, Microsoft Word, and
Internet Explorer. It supports
Mozilla Firefox and Thunderbird as
well.
Dragon may be used with handheld
digital recorders but also supports
cordless or array microphones, which
are usually included with ordered
packages. With the Macros, text and
graphic dictation shortcuts are
possible.
Custom vocabularies may be
formulated and the system can be
formatted and edited with voice
commands.
The price of obtaining a license for
Dragon Medical is based per
physician and the range varies
(depending on how many providers you
are purchasing the license for,
between $1199.00 to 1039.99 (for
over 625 physicians)
|
|
How the Comparison Relates to
You
|
|
Deep Probe into Analysis between 2
EMR Systems
The most obvious
problem with voice recognition
systems, is that each user must have
a training session so that the voice
pattern can be recognized by the
program. The average training time
takes approximately 30 minutes
between dictation time and time for
the computer to process the speech.
VR systems are highly impractical in
settings where there are many
"transient" providers as in a large
department with rotating interns or
residents or with many physicians
that are on staff at many hospitals.
In addition to the training for each
physician, the system requires
purchase separately for the license
of each physician.
A place where VR works quite well is
in a small controlled department
setting such as radiology or
emergency rooms since the area is
usually confined to a small number
of physicians and the terminology
tends to be slightly more limited.
Another problem with voice
recognition charting can be in the
use of templates. Although usage of
templates saves time, it is not
without its own unique problems.
Typically templates are used for:
*history of present illness
*past medical history
*past surgical history
*allergies
*family history
*medications
*social history
*chief complaint
*physician findings on examination
Any part of the examination that
differs from the template is either
eliminated or dictated over the
default with inserted macros.
But what happens when the physician
is in a rush to finish the chart
between patients? Glaring errors
have been seen where the chart will
read healthy young male when the
gynecologic exam was performed on a
woman or a complete examination is
recorded when only one system was
addressed in a patient coming for a
follow-up. So when using templates,
it is imperative to ensure that not
only the right template is used but
that the exam results and the actual
exam, match.
Checking for accuracy in VR systems
is done immediately at the time of
dictation. This being said, it is
clear that the health care provider
must be visualizing the computer
monitor and making the corrections
directly. This translates into
slightly more time than merely
dictating and sending it off to a
transcriber.
On the other hand, once completed,
the chart is also completed without
waiting for records to return. Turn
around time for dictation usually
varies anywhere between 39 minutes
to 48 hours.
So if it is an office where there is
time between patients, a small more
controlled setting or a site which
has a few constant providers, VR is
very manageable.
If your work setting is a large
facility, you are constantly on the
go, and have no time between
patients, relying on a medical
transcriptionist to transform your
dictations while you are on the go,
will relieve you of the additional
time spent at the computer screen.
In terms of cost, there is no
competition between the two if Voice
Recognition systems are practical in
your setting.
For instance: The Emergency
Department of Massachusetts General
Hospital figured that it cost
approximately $7.50 per chart using
a transcription service. With the
number of patients seen annually,
assuming that each one would be
dictated, the transcription cost was
projected at $337,500 per year. The
initial cost for Dragon was $3,000
for a savings of $334,500 in the
first year.
Fortunately, this is not an all or
none decision, just as there is
seldom a totally ideal setting.
There is no reason why there can not
be a combination of both systems to
your practice while you are getting
comfortable with the new
requirements.
|
|
Subscribers and Business Friends
|
|
|
For my
subscribers,colleagues and friends:
You can copy any content in this
newsletter for your own use as long
as the following accompanies it and
the link is live:
Reprinted by permission of Internet
copywriter Barbara Hales. For more
information on innovations and tips,
subscribe to the Medical Strategist
at:
http://www.TheWriteTreatment.com
If you would like to contribute your
news about a product or event as
well as your thoughts and comments,
please email me at:
Barbara@TheWriteTreatment.com.
Send me the lead of your website
article and your URL. It may be
published here so that your
colleagues can link to the "whole
story".
|
|
|
The Medical Strategist was founded in
2009 with the following established goals:
*Help guide you into a plan of action for your business
*Keep you in the loop on changes within the healthcare
field and how it impacts your practice
*Deliver pertinent information and new regulations
directly affecting you, the practitioner
*Identify barriers and how to navigate around them
*Act as your liaison between you the provider, IT
companies, pharmaceutical companies and governmental
agencies
For Your Health and Wealth,
Barbara Hales
The Write Treatment
Phone: 516-647-3002
|
|
|