SpeechRE

TTS for Text-to-Speech;

IWSLT for SpeechRE, and the model is placed in fairseq_modules;

fairseq is a modified version in source code;

Notice:

We implement our model in Speech_RE/IWSLT/fairseq_modules/models/wav2triplet_s2t.py

Our loss function is implemented in Speech_RE/fairseq/fairseq/criterions/label_smoothed_cross_entropy.py

Our cross-modal entity alignment method is implemented in Speech_RE/fairseq/fairseq/transformer.py (Alignment_forward function)

Our configuration file is Speech_RE/IWSLT/config/speechre_tacred_part_part.yaml

The training script for our model is placed in Speech_RE/IWSLT/run_train.sh

Dataset

Configuration information of the dataset synthesized by TTS:

conll04.tgz：https://drive.google.com/file/d/1Q5k3eM6WknfjA2DWo19CyTwZngYVXRUL/view?usp=sharing

re-tacred(dev&test_part).tgz：https://drive.google.com/file/d/1qctG-n_W51zp-hiPDS-XEl7jh_bI1l_-/view?usp=sharing

re-tacred(train_part).tgz：https://drive.google.com/file/d/1ainRqlx4h9_HDFtOq8xasN-OLJDNSbwD/view?usp=sharing

For example, the data of CoNLL04 is organized as:

├── conll04
│   ├── audio
│   │   ├── train
│   │   │   ├── train-0.wav
│   │   │   ├── train-1.wav
│   │   │   ├── train-2.wav
│   │   │   ├── ...
│   │   ├── dev
│   │   │   ├── dev-0.wav
│   │   │   ├── ...
│   │   ├── test
│   │   │   ├── test-0.wav
│   │   │   ├── ...
│   ├── train_conll04.tsv
│   ├── dev_conll04.tsv
│   ├── test_conll04.tsv

The format of tsv files:

id	audio	duration_ms	n_frames	tgt_text	speaker	tgt_lang
train-0	/path/to/datasets/conll04/audio/train/train-0.wav:0:239828	14989	239828	Radio Reloj Network Havana OrgBased_In	0	en
train-1	/path/to/datasets/conll04/audio/train/train-1.wav:0:64099	4006	64099	Bruno Pusterla Italian Agricultural Confederation Work_For	0	en
...

Notice: The real dataset we constructed will be released soon.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
IWSLT		IWSLT
TTS/Mozilla		TTS/Mozilla
apex		apex
fairseq		fairseq
.DS_Store		.DS_Store
README.md		README.md
audio_utils.py		audio_utils.py
test-1248_TTS.wav		test-1248_TTS.wav
test-1248_human.wav		test-1248_human.wav
test.py		test.py
test_tmp.py		test_tmp.py
test_tmp_wav.py		test_tmp_wav.py
train_raw_new_11.tsv		train_raw_new_11.tsv
wav2triplet_s2t.py		wav2triplet_s2t.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SpeechRE

Dataset

About

Releases

Packages

Languages

DeepLearnXMU/SpeechRE-MCAM

Folders and files

Latest commit

History

Repository files navigation

SpeechRE

Dataset

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages