A simple command-line tool for recording audio



Machine learning and natural language processing are transforming our relationship with our devices by giving them a human voice. People with visual impairments have especially benefited from these technologies, but those who speak languages like my native Odia have largely been left behind by most voicebanks.

When T. Shrinivasan, a Tamil-language Wikipedian, started the Voice-recorder-for-tawictionary, he probably didn’t realize how useful his open source tool can be for users like me. I was in search of a simple tool that could allow me to record large chunks of words in a short time so that those recordings can be used on Odia Wiktionary, a sister project of Wikipedia and a free dictionary in Odia language that has translations of Odia and other language words.

Shrini’s tool was a magical encounter. I forked the tool on GitHub and called it Kathabhidhana—which means a speech dictionary in Odia—and made a few changes to the code to suit my own setup. The project grew when I started facing new issues and documented the workarounds. Shrini has been super helpful to fix small little bugs and make additions like previewing the audio recording before saving it.

What does the tool do?

Kathabhidhana is a simple command line tool that can be run either on Linux or MacOS terminal.

Recording process using Kathabhidhana’s command line tool.

Before starting it, you need to download the entire tool, unzip, and open the file in a text editor to add a list of words that you want to record. Then it uses your computer’s microphone (either the computer’s mic or an external one), shows one word at a time, and provides four seconds (by default; you can change this in the code) to record it. Once recorded, it saves a temporary audio file in .WAV file format. You can then choose to preview the recorded audio, then save or re-record it. Once you are happy with the recording, you can just press “Y” to save and move to the next word. It automatically saves the file in .WAV and .ogg (an open format that is supported by many open source projects including Wikimedia Commons).

More than 1,700 audio files have been uploaded so far under the CC BY-SA 4.0 license, an open license that allows anyone to use, share, create a derivative, and distribute even for commercial purpose. Kathabhidhana is proudly made with GNU General Public License (GPL) version 3 and all the documentations and the audio recordings are in CC BY-SA 4.0.

Fellow Wikimedian Prateek Pattanaik worked on creating a workflow that uses a few tools and creates audio recordings in .ogg; it is available for download on the project documentation page.

Though the tool does not have any complex code or any GUI at all, its simplicity is part of what makes it so promising. There literally are no pronunciations of words for many languages like mine in an open standard.

The lack of an openly licensed voicebank stops developers from creating a text-to-speech or speech-to-text engine for visually impaired people and others. India, my home country, has over 15 million people with visual disabilities—the largest in the entire world. While there are open source screen readers like NonVisual Desktop Access that use voice synthesizers instead of real human voices, it is not comfortable to listen to a robotic voice for a long time. Moreover, machine learning and natural language processing not only help people with accessibility needs but can also totally transform the way we interact with our devices. The reason proprietary personal assistants like Siri, Google Assistant, and Cortana are so popular is because of the use of human voice recordings. With more open source voice-controlled solutions coming, imagine what openly licensed voicebanks in your native language can do.

More resources

  • LinguaLibre (see the source code) is a web tool created by the makers of legacy batch recording software Shtooka Recorder and SWAC Recorder. It is currently in its development and testing phase and available only with a French interface, but will eventually be available in multiple interface languages.
  • Pronuncify (a command-line tool for Linux) and Pronuncify.net (a GUI-based tool for Windows), specifically designed for Wiktionary, help with batch recording of words



Source link

,

Leave a Reply