Language does not come
naturally to machines. Unlike humans, computers cannot easily distinguish
between, say, a river bank and a savings bank. Satire and jokes? Algorithms
have great trouble with that. Irony? Wordplay? Cultural context? Forget it.
That human edge in decoding
what things mean is what a computer scientist turned entrepreneur, Luis von
Ahn, is betting on. His start-up, Duolingo, which opened to the public on
Tuesday, proposes to put armies of language learners to work translating text on
the Web.
For the learners, Duolingo
offers basic lessons, followed by sentences to translate, one at a time, from
simple to more difficult. For online content providers wanting translations,
Duolingo offers, for now at least, free labor. Because it is still in its early
days, there are no independent assessments available of how accurate or
efficient it can be.
The site has been available by
invitation only for the last five months and is now limited to English,
Spanish, French and German. People and companies can submit their content to
Duolingo for translation, a service the company may begin to charge for. To provide
content for its lessons, Duolingo can also harness whatever text is not under
copyright or is released under a liberal Creative Commons license. Users vote
for the best translations, providing some measure of quality control.
“You’re learning a language
and at the same time, helping to translate the Web,” Mr. von Ahn said. “You’re
learning by doing.”
Google Translate, by contrast,
relies entirely on machines to do the work — and while it usually captures the
essence of a piece of text, it can sometimes produce bewildering passages.
Google leverages vast amounts of data to produce its output, feeding its
translation engine with texts that have been translated into multiple
languages, including United Nations proceedings, which are then used to train
its machines.
Mr. von Ahn, by contrast, is
leveraging what he hopes will be crowds flocking to Duolingo for free language
lessons.
Crowdsourcing is at the heart
of Mr. von Ahn’s ambitions. His last enterprise, ReCaptcha, makes use of those
wavy letters and numbers that Web users transcribe every day on sites to ensure
that they are not robots trying to break in. Mr. von Ahn gathered those
squiggles from digitized images of old manuscripts, books and newspapers —
including The New York Times. Every time they transcribe the wavy words, Web
users provide free help in transcribing fading texts that are hard for a
machine to read. Google bought his start-up in 2009.
Mr. von Ahn, an associate
professor at Carnegie Mellon University in Pittsburgh, where Duolingo is based,
came up with the translation idea when he noticed that friends and relatives in
his native Guatemala had far less content available to them online if they did
not know English. The Web, Mr. von Ahn argued, is inferior in Spanish. “It’s
got much less information. I see people struggling with that a lot,” he said.
“They don’t get the information we take for granted.”
Human and machine translation
can work in different scenarios, said Alon Lavie, another Carnegie Mellon
professor who has a machine translation company called Safaba, aimed at
corporate clients. When businesses need to translate large amounts of text into
multiple languages, machine translation can be more useful, said Mr. Lavie,
particularly if business confidentiality is at stake.
“Where I think Duolingo’s
crowdsourcing makes a lot of sense is in scenarios where a consumer or
enterprise has a small translation job that needs to be done quickly and
cheaply, and the translation needs to come out at ‘human’ quality — similar to
what a human translator or bilingual speaker would generate,” Mr. Lavie said.
The New York Times has been
experimenting with Duolingo as a potential means to translate its digital
content to other languages, said Marc Frons, the company’s chief information
officer, but has made no commitments to using the service.
Mr. von Ahn is thinking of
taking on Wikipedia as his first translation project.
Wikipedia has more content
available in English — nearly four million articles — than in any other
language. German, French and Dutch follow, with 1.4 million, 1.3 million and 1
million articles. In other popular languages, Wikipedia content is sparse: in
Spanish, there are only 900,000 articles, and in Swahili, spoken across East
Africa, fewer than 24,000.
A spokesman for the Wikimedia
Foundation, Jay Walsh, said that anyone who wanted to use Wikipedia material
for translation was welcome to do so (it is published under a Creative Commons
license), but that feeding it back into the Wikipedia sites would require “a
conversation” to make sure translations were accurate. “The community that
makes up Wikipedia — they are confronted with the simultaneous challenge of
growth and also quality, making it excellent,” Mr. Walsh said.
For Duolingo to work well, it
needs a huge crowd of learners. The more proficient they become, the greater
the chances of accurate translations. In Duolingo, a large piece of text is
broken into easy and difficult pieces — by a computer, of course — then
parceled out to students at varying levels and put back together, again by a
machine. Mr. von Ahn said that “eventually we intend to charge content
providers either for faster or more accurate translations.”
Duolingo has raised $3.3
million in venture capital. The actor Ashton Kutcher is among the backers,
along with Union Square Ventures and the business advice author Tim Ferris.
ΠΗΓΗ
Δεν υπάρχουν σχόλια:
Δημοσίευση σχολίου