Facebook 发布 wav2letter 工具包，用于端到端自动语音辨认

2025-09-09 14:56:53

导读雷锋网 AI科技评论音讯，日前， Facebook 人工智能研讨院发布 wav2letter 工具包，它是一个复杂高效的端到端自动语音辨认(ASR)零碎，完成了Wav2Letter: an End-to-End ConvNet-based Speech Recognition System和Letter-Based Speech Recognition with Gated ConvNets这两篇

雷锋网 AI科技评论音讯，日前， Facebook 人工智能研讨院发布 wav2letter 工具包，它是一个复杂高效的端到端自动语音辨认(ASR)零碎，完成了 Wav2Letter: an End-to-End ConvNet-based Speech Recognition System 和 Letter-based Speech Recognition with Gated ConvNets 这两篇论文中提出的架构。假如大家想如今就开端运用这个工具停止语音辨认，Facebook 提供 Librispeech 数据集的预训练模型。

以下为对零碎的要求，以及这一工具的装置教程，雷锋网 AI科技评论整理如下：

装置要求：

零碎：MacOS 或 Linux

Torch：接上去会引见装置教程

在 CPU 上训练：Intel MKL

在 GPU 上训练：英伟达 CUDA 工具包 (cuDNN v5.1 for CUDA 8.0)

音频文件读取：Libsndfile

规范语音特征：FFTW

装置：

MKL

假如想在 CPU 上停止训练，激烈建议装置 Intel MKL

执行如下代码更新 .bashrc file

# We assume Torch will be installed in $HOME/usr.

# Change according to your needs.

export PATH=$HOME/usr/bin:$PATH

# This is to detect MKL during compilation

# but also to make sure it is found at runtime.

INTEL_DIR=/opt/intel/lib/intel64
MKL_DIR=/opt/intel/mkl/lib/intel64
MKL_INC_DIR=/opt/intel/mkl/include

if [ ! -d "$INTEL_DIR" ]; then
echo "$ warning: INTEL_DIR out of date"

fi

if [ ! -d "$MKL_DIR" ]; then
echo "$ warning: MKL_DIR out of date"

fi

if [ ! -d "$MKL_INC_DIR" ]; then
echo "$ warning: MKL_INC_DIR out of date"

fi

# Make sure MKL can be found by Torch .

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$INTEL_DIR:$MKL_DIR

export CMAKE_LIBRARY_PATH=$LD_LIBRARY_PATH

export CMAKE_INCLUDE_PATH=$CMAKE_INCLUDE_PATH:$MKL_INC_DIR

LuaJIT 和 LuaRocks

执行如下代码可以在 $HOME/usr 下装置 LuaJIT 和 LuaRocks，假如你想要停止零碎级装置，删掉代码中的 -DCMAKE_INSTALL_PREFIX=$HOME/usr 即可。

git clone https://github.com/torch/luajit-rocks.git

cd luajit-rocks
mkdir build; cd build
cmake .. -DCMAKE_INSTALL_PREFIX=$HOME/usr -DWITH_LUAJIT21=OFF
make -j 4
make install

cd ../..

接上去，我们假定 luarocks 和 luajit 被装置在 $PATH 下，假如你把它们装置在 $HOME/usr 下了，可以执行 ~/usr/bin/luarocks 和 ~/usr/bin/luajit 这两段代码。

KenLM 言语模型工具包

假如你想采用 wav2letter decoder，需求装置 KenLM。

这里需求用到 Boost ：

# make sure boost is installed (with system/thread/test modules)

# actual command might vary depending on your system

sudo apt-get install libboost-dev libboost-system-dev libboost-thread-dev libboost-test-dev

Boost 装置之后就可以装置 KenLM 了：

wget https://kheafield.com/code/kenlm.tar.gz
tar xfvz kenlm.tar.gzcd kenlm
mkdir build && cd build
cmake .. -DCMAKE_INSTALL_PREFIX=$HOME/usr -DCMAKE_POSITION_INDEPENDENT_CODE=ON
make -j 4
make install
cp -a lib/* ~/usr/lib # libs are not installed by default :(cd ../..

OpenMPI 和 TorchMPI

假如方案用到多 CPU/GPU（或许多设备），需求装置 OpenMPI 和 TorchMPI

免责声明：我们十分鼓舞大家重新编译 OpenMPI。规范发布版本中的 OpenMPI 二进制文件编译标志不分歧，想要成功编译和运转 TorchMPI，确定的编译标志至关重要。

先装置 OpenMPI：

wget https://www.open-mpi.org/software/ompi/v2.1/downloads/openmpi-2.1.2.tar.bz2
tar xfj openmpi-2.1.2.tar.bz2

cd openmpi-2.1.2; mkdir build; cd build
./configure --prefix=$HOME/usr --enable-mpi-cxx --enable-shared --with-slurm --enable-mpi-thread-multiple --enable-mpi-ext=affinity,cuda --with-cuda=/public/apps/cuda/9.0
make -j 20 all
make install

留意：也可以执行 openmpi-3.0.0.tar.bz2，但需求删掉 --enable-mpi-thread-multiple。

接上去可以装置 TorchMPI 了：

MPI_CXX_COMPILER=$HOME/usr/bin/mpicxx ~/usr/bin/luarocks install torchmpi

Torch 和其他 Torch 包

luarocks install torch
luarocks install cudnn # for GPU supportluarocks install cunn # for GPU support

wav2letter 包

git clone https://github.com/facebookresearch/wav2letter.git
cd wav2letter
cd gtn && luarocks make rocks/gtn-scm-1.rockspec && cd ..
cd speech && luarocks make rocks/speech-scm-1.rockspec && cd ..
cd torchnet-optim && luarocks make rocks/torchnet-optim-scm-1.rockspec && cd ..
cd wav2letter && luarocks make rocks/wav2letter-scm-1.rockspec && cd ..
# Assuming here you got KenLM in $HOME/kenlm
# And only if you plan to use the decoder:
cd beamer && KENLM_INC=$HOME/kenlm luarocks make rocks/beamer-scm-1.rockspec && cd ..

训练 wav2letter 模型

数据预处置

数据文件夹中有预处置不同数据集的多个脚本，如今我们只提供预处置 LibriSpeech 和 TIMIT 数据集的脚本。

上面是预处置 LibriSpeech ASR 数据集的案例：

wget http://www.openslr.org/resources/12/dev-clean.tar.gz
tar xfvz dev-clean.tar.gz

# repeat for train-clean-100, train-clean-360, train-other-500, dev-other, test-clean, test-other

luajit ~/wav2letter/data/librispeech/create.lua ~/LibriSpeech ~/librispeech-proc
luajit ~/wav2letter/data/utils/create-sz.lua librispeech-proc/train-clean-100 librispeech-proc/train-clean-360 librispeech-proc/train-other-500 librispeech-proc/dev-clean librispeech-proc/dev-other librispeech-proc/test-clean librispeech-proc/test-other

训练

mkdir experiments
luajit ~/wav2letter/train.lua --train -rundir ~/experiments -runname hello_librispeech -arch ~/wav2letter/arch/librispeech-glu-highdropout -lr 0.1 -lrcrit 0.0005 -gpu 1 -linseg 1 -linlr 0 -linlrcrit 0.005 -onorm target -nthread 6 -dictdir ~/librispeech-proc -datadir ~/librispeech-proc -train train-clean-100+train-clean-360+train-other-500 -valid dev-clean+dev-other -test test-clean+test-other -gpu 1 -sqnorm -mfsc -melfloor 1 -surround "|" -replabel 2 -progress -wnorm -normclamp 0.2 -momentum 0.9 -weightdecay 1e-05

多 GPU 训练

应用 OpenMPI

mpirun -n 2 --bind-to none ~/TorchMPI/scripts/wrap.sh luajit ~/wav2letter/train.lua --train -mpi -gpu 1 ...

运转 decoder（推理阶段）

为了运转 decoder，需求做大批预处置。

首先创立一个字母词典，其中包括在 wav2letter 中用到的特殊反复字母：

cat ~/librispeech-proc/letters.lst >> ~/librispeech-proc/letters-rep.lst && echo "1" >> ~/librispeech-proc/letters-rep.lst && echo "2" >> ~/librispeech-proc/letters-rep.lst

然后将失掉一个言语模型，并对这个模型停止预处置。这里，我们将运用事后训练过的 LibriSpeech 言语模型，大家也可以用 KenLM 训练本人的模型。然后，我们对模型停止预处置，脚本能够会对错误转录的单词给予正告，这不是什么大成绩，由于这些词很少见。

wget http://www.openslr.org/resources/11/3-gram.pruned.3e-7.arpa.gz luajit

~/wav2letter/data/utils/convert-arpa.lua ~/3-gram.pruned.3e-7.arpa.gz ~/3-gram.pruned.3e-7.arpa ~/dict.lst -preprocess ~/wav2letter/data/librispeech/preprocess.lua -r 2 -letters letters-rep.lst

可选项：应用 KenLM 将模型转换成二进制格式，加载起来将会更快。

build_binary 3-gram.pruned.3e-7.arpa 3-gram.pruned.3e-7.bin

如今运转 test.lua lua，可以生成 emission。上面的脚本可以显示出字母错误率 (LER) 和单词错误率 (WER)。

luajit ~/wav2letter/test.lua ~/experiments/hello_librispeech/001_model_dev-clean.bin -progress -show -test dev-clean -save

一旦存储好 emission，可以执行 decoder 来计算 WER：

luajit ~/wav2letter/decode.lua ~/experiments/hello_librispeech dev-clean -show -letters ~/librispeech-proc/letters-rep.lst -words ~/dict.lst -lm ~/3-gram.pruned.3e-7.arpa -lmweight 3.1639 -beamsize 25000 -beamscore 40 -nthread 10 -smearing max -show

预训练好的模型：

我们提供训练充沛的 LibriSpeech 模型：

wget https://s3.amazonaws.com/wav2letter/models/librispeech-glu-highdropout.bin

留意：该模型是在 Facebook 的框架下训练好的，因而需求用略微不同的参数来运转 test.lua

luajit ~/wav2letter/test.lua ~/librispeech-glu-highdropout.bin -progress -show -test dev-clean -save -datadir ~/librispeech-proc/ -dictdir ~/librispeech-proc/ -gfsai

大家可以参加 wav2letter 社群

Facebook： https://www.facebook.com/groups/717232008481207/

Google 社群： https://groups.google.com/forum/#!forum/wav2letter-users

via： GitHub

雷锋网 (大众号：雷锋网) AI 科技评论编译整理。

雷锋网版权文章，未经受权制止转载。概况见。

免责声明：本文章由会员“高原明”发布如果文章侵权，请联系我们处理，本站仅提供信息存储空间服务如因作品内容、版权和其他问题请于本站联系

标签：