Project Code and Data

This GitHub repository contains code and data for our project.

Demo: https://fyyfu.github.io/SafetyAlignNLP/
Paper: https://arxiv.org/abs/2312.06924

Abstract

Recent developments in balancing the usefulness and safety of Large Language Models (LLMs) have raised a critical question: Are mainstream NLP tasks adequately aligned with safety considerations? Our study, focusing on safety-sensitive documents obtained through adversarial attacks, reveals significant disparities in the safety alignment of various NLP tasks. For instance, LLMs can effectively summarize malicious long documents but often refuse to translate them. This discrepancy highlights a previously unidentified vulnerability: attacks exploiting tasks with weaker safety alignment, like summarization, can potentially compromise the integrity of tasks traditionally deemed more robust, such as translation and question-answering (QA). Moreover, the concurrent use of multiple NLP tasks with lesser safety alignment increases the risk of LLMs inadvertently processing harmful content. We demonstrate these vulnerabilities in various safety-aligned LLMs, particularly Llama2 models, Gemini, and GPT-4, indicating an urgent need for strengthening safety alignments across a broad spectrum of NLP tasks.

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
data		data
static		static
.nojekyll		.nojekyll
README.md		README.md
generation.sh		generation.sh
index.html		index.html
multi_prompt_generation.py		multi_prompt_generation.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Code and Data

Abstract

About

Releases

Packages

Contributors 2

Languages

FYYFU/SafetyAlignNLP

Folders and files

Latest commit

History

Repository files navigation

Project Code and Data

Abstract

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages