"Project Gutenberg" uses neural text-to-speech technology to release 5,000 free audiobooks

Audiobooks have exploded in popularity in recent years due to their readability, but recording audiobooks is difficult and expensive. Recently, researchers demonstrated an automated method using synthesized text-to-speech that solves many of the problems faced by the technology and allows ordinary users to produce audiobooks. Now readers can listen to thousands of classic literary audiobooks and other public domain materials for free through Project Gutenberg. Researchers at Microsoft and MIT created the collection by scanning the books using text-to-speech software.

These texts include works by Shakespeare, Agatha Christie, Jane Austen, Leonardo da Vinci and others. Users can listen on InternetArchive, Spotify, ApplePodcasts and GooglePodcasts:

https://marhamilresearch4.blob.core.Windows.net/gutenberg-public/Website/index.html

The code used to build the audiobook collection is available on GitHub:

https://github.com/microsoft/SynapseML

Apple began selling audiobooks in January this year using automatic text-to-speech technology. However, the attempt has been met with skepticism from the literary establishment, who criticize Apple's business goals, and from voice actors who provide training for the company's artificial intelligence. Gutenberg's approach may elicit mixed reactions because it is open source and has no profit motive.

Project Gutenberg has spent decades building a repository of free literature in text format that is widely available for free, but audiobooks can make this material more accessible. Audiobooks are helpful for readers who drive, multitask, are visually impaired, learning to read, or learning a new language.

Producing audiobooks using traditional methods involves spending time and money having someone read the entire book. It's not cost-effective to manually record an audio version of every book worth reading. Text-to-speech technology was a better fit for Project Gutenberg. However, researchers face multiple obstacles with their machine learning tools.

The first and most important issue is determining which digital books the software can parse. Project Gutenberg collects materials in a variety of formats, and many of the files contain errors or imperfect scans. So the researchers focused on books stored in the HTML file format and built a tool (pictured above) to discover which items displayed a similar format.

Another problem the researchers addressed was ensuring that the system knew which text to read or ignore. It involves components such as table of contents, page numbers, footnotes, tables and other extraneous material.

Additionally, the result needs to sound close enough to natural human speech. The researchers focused on the voice expressions that work best for nonfiction and narration, but users can also tweak the software to experiment with dramatic readings.

The researchers plan to hold a demonstration that will let users generate audiobooks with their own voices. After recording a few sentences to train the algorithm, each participant could listen to a sample before having the software read the entire book. They will also receive a copy of the audiobook via email. Users can choose from synthesized voices to customize each audiobook.

access:

Alibaba Cloud - Universal vouchers up to 1888 yuan available immediately