Predicting words belonging to pauses using ROBERTA-LARGE LM

This repository was made for predicting words belonging to pauses using ROBERTA-LARGE, a large-scale language model. The fine-tuned models that are trained on healthy speech can be used on Huggingface. These are available on this page: https://huggingface.co/Middelz2.

Acquiring the dataset

Since the entire dataset is too large to be pushed on Github and may contain sensitive personal information, the data can be made available on request.

Installing requirements

This library uses many different libraries. To ensure that all library requirements are met, first create a new virtual environment, locate to the folder and install all requirements using the following command:

pip install requirements.txt

Files explained

The following files are most important in this repository.

File	Information
preprocessing.py	This file is centred around preprocessing the raw (.csv) input sentenced into an extra dataframe column containing preprocessed (clean) text.
preprocessing_helper.py	Helper function for preprocessessing.py. Contains all additional functions that is used to clean the input sentences (.cha files)
setup_dataframe.py	Setup file that processes raw data (.txt data retrieved from .cha files) into a usable .csv file.

Jupyter notebooks

The following notebooks were used to train our roberta model and to make our predictions.

File	Information
ROBERTA_Aphasia_Finetuning.ipynb	The file used to finetune our Roberta Large model.
ROBERTA_Aphasia_Single_[mask]_prediction.ipynb	The notebook used to make the predictions based on our large model.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
google_collab		google_collab
preprocessing		preprocessing
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predicting words belonging to pauses using ROBERTA-LARGE LM

Acquiring the dataset

Installing requirements

Files explained

Jupyter notebooks

About

Releases

Packages

Languages

NiVisser/AphasiaPredictingWords

Folders and files

Latest commit

History

Repository files navigation

Predicting words belonging to pauses using ROBERTA-LARGE LM

Acquiring the dataset

Installing requirements

Files explained

Jupyter notebooks

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages