EleutherAI / lm-evaluation-harness Public

Notifications You must be signed in to change notification settings
Fork 2k
Star 7.5k

Code
Issues 344
Pull requests 101
Actions
Projects 1
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Issues: EleutherAI/lm-evaluation-harness

reproduce llama 3 evals

#2557 opened Dec 10, 2024 by baberabb

Open 1

Labels 10 Milestones 1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

344 Open 872 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

PudMedQA tasks require trust_remote_code=True argument, but argument doesnt work

#2631 opened Jan 18, 2025 by dhp-ks

Add HRM8K (new math benchmark)

#2623 opened Jan 14, 2025 by bzantium

None of the french_bench tasks are working. asking questions

For asking for clarification / support on library usage.

#2619 opened Jan 10, 2025 by jaslatendresse

Can't find the dataset lighteval/MATH-Hard in the huggingface bug

Something isn't working.

#2618 opened Jan 10, 2025 by Chenxi622

When I set device=CPU, long-time stuck after running loglikelihood requests.

#2617 opened Jan 9, 2025 by liangy001

Tag for mgsm_cot_native and mgsm_cot_en same?

#2614 opened Jan 7, 2025 by Mugariya

use multiple subset of the same dataset

#2612 opened Jan 7, 2025 by surprisedPikachu007

Question about the evaluation of OpenBookQA asking questions

For asking for clarification / support on library usage.

#2610 opened Jan 6, 2025 by xumingyu2021

Significant Discrepancy in ARC-Challenge Accuracy: Llama-3.2-3B Official vs lm-evaluation-harness asking questions

For asking for clarification / support on library usage.

#2605 opened Dec 31, 2024 by luoxuan-cs

Passing a limit doesn't randomly sample, but rather takes dataset[:limit], introducing dataset bias

#2598 opened Dec 27, 2024 by aalpat1

How to resolve the “Too Many Requests” issue encountered when using the OpenAI API?

#2594 opened Dec 24, 2024 by Here1sWqW

Couldn't detect gpu when generation using ray data_parallel_size > 1

#2591 opened Dec 23, 2024 by zhaocaibei123

Evalating model on MT-Bench and LBPP

#2590 opened Dec 23, 2024 by sorobedio

Strange memory footprint

#2589 opened Dec 22, 2024 by zxgx

Weird results for 70b models

#2584 opened Dec 19, 2024 by BeksultanSagyndyk

How to exactly reproduce the results on the openllm leaderboard?

#2583 opened Dec 19, 2024 by Zilinghan

Repeated Running Scripts During Perplexity Task Execution on Windows

#2581 opened Dec 19, 2024 by zhuyuhua-v

CaseHOLD Task Implementation

#2571 opened Dec 16, 2024 by zolastro

Question: Is there an easy way for me to know all the generation_until tasks?

#2569 opened Dec 14, 2024 by Ki-Seki

reproduce llama 3 evals good first issue

Good for newcomers

validation

For validation of task implementations.

#2557 opened Dec 10, 2024 by baberabb

fail to reproduce Deepseek-math result asking questions

For asking for clarification / support on library usage.

validation

For validation of task implementations.

#2555 opened Dec 10, 2024 by zhuqiangLu

Hendrycks Math extraction rule seems too strict good first issue

Good for newcomers

validation

For validation of task implementations.

#2552 opened Dec 8, 2024 by fzyzcjy

Inconsistent responses for the same case with different limit parameters

#2550 opened Dec 7, 2024 by Starry-Liu1

Inquiry about the feature to continue evaluation after abnormal termination asking questions

For asking for clarification / support on library usage.

#2548 opened Dec 6, 2024 by minimi-kei

Add Global-MMLU

#2547 opened Dec 6, 2024 by shivalika-singh

Previous 1 2 3 4 5 … 13 14 Next

Previous Next

ProTip! Type g i on any issue or pull request to go back to the issue listing page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly