Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support telechat2 #35415

Open
wants to merge 51 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
2927aa2
add telechat2
Dec 18, 2024
4d7d033
Merge branch 'main' of https://github.com/shunxing12345/transformers
Dec 18, 2024
9c12eef
fix
Dec 18, 2024
aa4c049
Merge branch 'huggingface:main' into main
shunxing12345 Dec 23, 2024
635ab40
fix
Dec 24, 2024
7e48f0c
fix
Dec 24, 2024
e437827
fix
Dec 24, 2024
ef5f8d9
fix
Dec 25, 2024
5b37053
fix
shunxing12345 Dec 30, 2024
cc3293d
fix
shunxing12345 Dec 30, 2024
fa79011
fix
shunxing12345 Dec 30, 2024
6370c62
fix
shunxing12345 Dec 30, 2024
775062f
fix
shunxing12345 Dec 30, 2024
1fae6d0
fix
shunxing12345 Dec 30, 2024
8992e8b
fix
shunxing12345 Dec 30, 2024
5f40249
fix
shunxing12345 Dec 30, 2024
a11b9a7
fix
shunxing12345 Dec 30, 2024
9a8151c
fix
shunxing12345 Dec 30, 2024
1cca3c1
fix
shunxing12345 Dec 31, 2024
55166ec
fix
shunxing12345 Dec 31, 2024
8fed463
fix
shunxing12345 Dec 31, 2024
5878ac8
fix
shunxing12345 Dec 31, 2024
2713ddb
fix
shunxing12345 Dec 31, 2024
914a9ca
fix
shunxing12345 Jan 2, 2025
7437cb1
fix
shunxing12345 Jan 2, 2025
8860862
fix
shunxing12345 Jan 2, 2025
45e1722
fix
shunxing12345 Jan 2, 2025
ab49203
Merge branch 'main' into main
shunxing12345 Jan 2, 2025
10aa6dd
Merge branch 'main' into fix_conflicts
shunxing12345 Jan 10, 2025
3fed9c6
fix_conflicts
shunxing12345 Jan 10, 2025
ea67624
fix conflicts
shunxing12345 Jan 10, 2025
69665fc
fix
shunxing12345 Jan 11, 2025
98c4def
fix
shunxing12345 Jan 11, 2025
9755b7a
fix
shunxing12345 Jan 11, 2025
2b7cff7
fix
shunxing12345 Jan 11, 2025
4cff60b
fix
shunxing12345 Jan 12, 2025
494e6b1
fix
shunxing12345 Jan 12, 2025
32e75cd
change docs
shunxing12345 Jan 14, 2025
d023fc3
Merge branch 'fix_conflicts' of https://github.com/shunxing12345/tran…
shunxing12345 Jan 14, 2025
8844c32
Merge branch 'main' into fix_conflicts
shunxing12345 Jan 14, 2025
f96bdf4
fix
shunxing12345 Jan 14, 2025
6dcfed0
Merge branch 'fix_conflicts' of https://github.com/shunxing12345/tran…
shunxing12345 Jan 14, 2025
4673ee4
fix
shunxing12345 Jan 14, 2025
ad04c29
fix
shunxing12345 Jan 14, 2025
f28eeaa
fix
shunxing12345 Jan 14, 2025
e4a74b1
add weight convert
shunxing12345 Jan 17, 2025
cf62a9c
fix
shunxing12345 Jan 17, 2025
08a4342
fix
shunxing12345 Jan 17, 2025
68050fd
fix
shunxing12345 Jan 17, 2025
e6a5665
fix
shunxing12345 Jan 17, 2025
bc1e1d3
fix
shunxing12345 Jan 17, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/source/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -604,6 +604,8 @@
title: T5v1.1
- local: model_doc/tapex
title: TAPEX
- local: model_doc/telechat2
title: TeleChat2
- local: model_doc/transfo-xl
title: Transformer XL
- local: model_doc/ul2
Expand Down
1 change: 1 addition & 0 deletions docs/source/en/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -329,6 +329,7 @@ Flax), PyTorch, and/or TensorFlow.
| [Table Transformer](model_doc/table-transformer) | ✅ | ❌ | ❌ |
| [TAPAS](model_doc/tapas) | ✅ | ✅ | ❌ |
| [TAPEX](model_doc/tapex) | ✅ | ✅ | ✅ |
| [TeleChat2](model_doc/telechat2) | ✅ | ❌ | ❌ |
| [TextNet](model_doc/textnet) | ✅ | ❌ | ❌ |
| [Time Series Transformer](model_doc/time_series_transformer) | ✅ | ❌ | ❌ |
| [TimeSformer](model_doc/timesformer) | ✅ | ❌ | ❌ |
Expand Down
80 changes: 80 additions & 0 deletions docs/source/en/model_doc/telechat2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.

⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.

-->

# TeleChat2

## Overview

The TeleChat2 model was proposed in [TELECHAT TECHNICAL REPORT](https://arxiv.org/pdf/2401.03804) by TeleAI.

The abstract from the paper is the following:
*TeleChat is a series of large language models, offering decoder-based language models in various sizes (3B, 7B, and 12B). For each size, we provide both the base pretrained model and the fine-tuned chat model aligned with human preferences. TeleChat leverages a Transformer architecture with features such as SwiGLU activation, advanced attention mechanisms (QKV bias, group query attention), and support for sliding window attention. The models are optimized for bilingual proficiency (English and Chinese) and include an enhanced tokenizer adaptable to diverse natural languages and coding formats.*

The original code for telechat2 can be found [here](https://huggingface.co/Tele-AI/TeleChat2-7B).
## Tips
In the following, we demonstrate how to use `TeleChat2-7B` for inference. The example below shows how to use `apply_chat_template` with the ChatML format for dialog.

```python
>>> from transformers import AutoModelForCausalLM, AutoTokenizer
>>> device = "cuda" # the device to load the model onto

>>> model = AutoModelForCausalLM.from_pretrained("Tele-AI/TeleChat2-7B", device_map="auto")
>>> tokenizer = AutoTokenizer.from_pretrained("Tele-AI/TeleChat2-7B")

>>> prompt = "Give me a short introduction to large language model."

>>> messages = [{"role": "user", "content": prompt}]

>>> text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

>>> model_inputs = tokenizer([text], return_tensors="pt").to(device)

>>> generated_ids = model.generate(model_inputs.input_ids, max_new_tokens=512, do_sample=True)

>>> generated_ids = [output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)]

>>> response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
```

## TeleChat2Config

[[autodoc]] TeleChat2Config


## TeleChat2Model

[[autodoc]] TeleChat2Model
- forward

## TeleChat2ForCausalLM

[[autodoc]] TeleChat2ForCausalLM
- forward

## TeleChat2ForSequenceClassification

[[autodoc]] TeleChat2ForSequenceClassification
- forward

## TeleChat2ForTokenClassification

[[autodoc]] TeleChat2ForTokenClassification
- forward

## TeleChat2ForQuestionAnswering

[[autodoc]] TeleChat2ForQuestionAnswering
- forward
2 changes: 2 additions & 0 deletions docs/source/en/perf_infer_gpu_one.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,7 @@ FlashAttention-2 is currently supported for the following architectures:
* [Qwen2VL](https://huggingface.co/docs/transformers/model_doc/qwen2_vl#transformers.Qwen2VLModel)
* [RAG](https://huggingface.co/docs/transformers/model_doc/rag#transformers.RagModel)
* [SpeechEncoderDecoder](https://huggingface.co/docs/transformers/model_doc/speech_encoder_decoder#transformers.SpeechEncoderDecoderModel)
* [TeleChat2](https://huggingface.co/docs/transformers/model_doc/telechat2)
* [VisionEncoderDecoder](https://huggingface.co/docs/transformers/model_doc/vision_encoder_decoder#transformers.VisionEncoderDecoderModel)
* [VisionTextDualEncoder](https://huggingface.co/docs/transformers/model_doc/vision_text_dual_encoder#transformers.VisionTextDualEncoderModel)
* [Whisper](https://huggingface.co/docs/transformers/model_doc/whisper#transformers.WhisperModel)
Expand Down Expand Up @@ -310,6 +311,7 @@ For now, Transformers supports SDPA inference and training for the following arc
* [MusicGen Melody](https://huggingface.co/docs/transformers/model_doc/musicgen_melody#transformers.MusicgenMelodyModel)
* [Nemotron](https://huggingface.co/docs/transformers/model_doc/nemotron)
* [SpeechEncoderDecoder](https://huggingface.co/docs/transformers/model_doc/speech_encoder_decoder#transformers.SpeechEncoderDecoderModel)
* [TeleChat2](https://huggingface.co/docs/transformers/model_doc/telechat2)
* [VideoLlava](https://huggingface.co/docs/transformers/model_doc/video_llava)
* [VipLlava](https://huggingface.co/docs/transformers/model_doc/vipllava)
* [VisionEncoderDecoder](https://huggingface.co/docs/transformers/model_doc/vision_encoder_decoder#transformers.VisionEncoderDecoderModel)
Expand Down
20 changes: 20 additions & 0 deletions src/transformers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -797,6 +797,7 @@
"TapasConfig",
"TapasTokenizer",
],
"models.telechat2": ["TeleChat2Config"],
"models.textnet": ["TextNetConfig"],
"models.time_series_transformer": ["TimeSeriesTransformerConfig"],
"models.timesformer": ["TimesformerConfig"],
Expand Down Expand Up @@ -3623,6 +3624,16 @@
"load_tf_weights_in_tapas",
]
)
_import_structure["models.telechat2"].extend(
[
"TeleChat2ForCausalLM",
"TeleChat2ForQuestionAnswering",
"TeleChat2ForSequenceClassification",
"TeleChat2ForTokenClassification",
"TeleChat2Model",
"TeleChat2PreTrainedModel",
]
)
_import_structure["models.textnet"].extend(
[
"TextNetBackbone",
Expand Down Expand Up @@ -5880,6 +5891,7 @@
TapasConfig,
TapasTokenizer,
)
from .models.telechat2 import TeleChat2Config
from .models.textnet import TextNetConfig
from .models.time_series_transformer import (
TimeSeriesTransformerConfig,
Expand Down Expand Up @@ -8247,6 +8259,14 @@
TapasPreTrainedModel,
load_tf_weights_in_tapas,
)
from .models.telechat2 import (
TeleChat2ForCausalLM,
TeleChat2ForQuestionAnswering,
TeleChat2ForSequenceClassification,
TeleChat2ForTokenClassification,
TeleChat2Model,
TeleChat2PreTrainedModel,
)
from .models.textnet import (
TextNetBackbone,
TextNetForImageClassification,
Expand Down
1 change: 1 addition & 0 deletions src/transformers/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -255,6 +255,7 @@
t5,
table_transformer,
tapas,
telechat2,
textnet,
time_series_transformer,
timesformer,
Expand Down
2 changes: 2 additions & 0 deletions src/transformers/models/auto/configuration_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -282,6 +282,7 @@
("t5", "T5Config"),
("table-transformer", "TableTransformerConfig"),
("tapas", "TapasConfig"),
("telechat2", "TeleChat2Config"),
("textnet", "TextNetConfig"),
("time_series_transformer", "TimeSeriesTransformerConfig"),
("timesformer", "TimesformerConfig"),
Expand Down Expand Up @@ -619,6 +620,7 @@
("table-transformer", "Table Transformer"),
("tapas", "TAPAS"),
("tapex", "TAPEX"),
("telechat2", "TeleChat2"),
("textnet", "TextNet"),
("time_series_transformer", "Time Series Transformer"),
("timesformer", "TimeSformer"),
Expand Down
5 changes: 5 additions & 0 deletions src/transformers/models/auto/modeling_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -259,6 +259,7 @@
("t5", "T5Model"),
("table-transformer", "TableTransformerModel"),
("tapas", "TapasModel"),
("telechat2", "TeleChat2Model"),
("textnet", "TextNetModel"),
("time_series_transformer", "TimeSeriesTransformerModel"),
("timesformer", "TimesformerModel"),
Expand Down Expand Up @@ -564,6 +565,7 @@
("speech_to_text_2", "Speech2Text2ForCausalLM"),
("stablelm", "StableLmForCausalLM"),
("starcoder2", "Starcoder2ForCausalLM"),
("telechat2", "TeleChat2ForCausalLM"),
("transfo-xl", "TransfoXLLMHeadModel"),
("trocr", "TrOCRForCausalLM"),
("whisper", "WhisperForCausalLM"),
Expand Down Expand Up @@ -1042,6 +1044,7 @@
("starcoder2", "Starcoder2ForSequenceClassification"),
("t5", "T5ForSequenceClassification"),
("tapas", "TapasForSequenceClassification"),
("telechat2", "TeleChat2ForSequenceClassification"),
("transfo-xl", "TransfoXLForSequenceClassification"),
("umt5", "UMT5ForSequenceClassification"),
("xlm", "XLMForSequenceClassification"),
Expand Down Expand Up @@ -1119,6 +1122,7 @@
("splinter", "SplinterForQuestionAnswering"),
("squeezebert", "SqueezeBertForQuestionAnswering"),
("t5", "T5ForQuestionAnswering"),
("telechat2", "TeleChat2ForQuestionAnswering"),
("umt5", "UMT5ForQuestionAnswering"),
("xlm", "XLMForQuestionAnsweringSimple"),
("xlm-roberta", "XLMRobertaForQuestionAnswering"),
Expand Down Expand Up @@ -1223,6 +1227,7 @@
("stablelm", "StableLmForTokenClassification"),
("starcoder2", "Starcoder2ForTokenClassification"),
("t5", "T5ForTokenClassification"),
("telechat2", "TeleChat2ForTokenClassification"),
("umt5", "UMT5ForTokenClassification"),
("xlm", "XLMForTokenClassification"),
("xlm-roberta", "XLMRobertaForTokenClassification"),
Expand Down
1 change: 1 addition & 0 deletions src/transformers/models/auto/tokenization_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -503,6 +503,7 @@
),
("tapas", ("TapasTokenizer", None)),
("tapex", ("TapexTokenizer", None)),
("telechat2", ("LlamaTokenizer", "LlamaTokenizerFast" if is_tokenizers_available() else None)),
("transfo-xl", ("TransfoXLTokenizer", None)),
("tvp", ("BertTokenizer", "BertTokenizerFast" if is_tokenizers_available() else None)),
(
Expand Down
27 changes: 27 additions & 0 deletions src/transformers/models/telechat2/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Copyright 2024 EleutherAI and The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import TYPE_CHECKING

from ...utils import _LazyModule
from ...utils.import_utils import define_import_structure


if TYPE_CHECKING:
from .configuration_telechat2 import *
from .modeling_telechat2 import *
else:
import sys

_file = globals()["__file__"]
sys.modules[__name__] = _LazyModule(__name__, _file, define_import_structure(_file), module_spec=__spec__)
Loading