-
Notifications
You must be signed in to change notification settings - Fork 2k
Issues: EleutherAI/lm-evaluation-harness
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
PudMedQA tasks require trust_remote_code=True argument, but argument doesnt work
#2631
opened Jan 18, 2025 by
dhp-ks
None of the french_bench tasks are working.
asking questions
For asking for clarification / support on library usage.
#2619
opened Jan 10, 2025 by
jaslatendresse
Can't find the dataset lighteval/MATH-Hard in the huggingface
bug
Something isn't working.
#2618
opened Jan 10, 2025 by
Chenxi622
When I set device=CPU, long-time stuck after running loglikelihood requests.
#2617
opened Jan 9, 2025 by
liangy001
Question about the evaluation of OpenBookQA
asking questions
For asking for clarification / support on library usage.
#2610
opened Jan 6, 2025 by
xumingyu2021
Significant Discrepancy in ARC-Challenge Accuracy: Llama-3.2-3B Official vs lm-evaluation-harness
asking questions
For asking for clarification / support on library usage.
#2605
opened Dec 31, 2024 by
luoxuan-cs
Passing a limit doesn't randomly sample, but rather takes dataset[:limit], introducing dataset bias
#2598
opened Dec 27, 2024 by
aalpat1
How to resolve the “Too Many Requests” issue encountered when using the OpenAI API?
#2594
opened Dec 24, 2024 by
Here1sWqW
Couldn't detect gpu when generation using ray data_parallel_size > 1
#2591
opened Dec 23, 2024 by
zhaocaibei123
How to exactly reproduce the results on the openllm leaderboard?
#2583
opened Dec 19, 2024 by
Zilinghan
Repeated Running Scripts During Perplexity Task Execution on Windows
#2581
opened Dec 19, 2024 by
zhuyuhua-v
Question: Is there an easy way for me to know all the generation_until tasks?
#2569
opened Dec 14, 2024 by
Ki-Seki
reproduce llama 3 evals
good first issue
Good for newcomers
validation
For validation of task implementations.
#2557
opened Dec 10, 2024 by
baberabb
fail to reproduce Deepseek-math result
asking questions
For asking for clarification / support on library usage.
validation
For validation of task implementations.
#2555
opened Dec 10, 2024 by
zhuqiangLu
Hendrycks Math extraction rule seems too strict
good first issue
Good for newcomers
validation
For validation of task implementations.
#2552
opened Dec 8, 2024 by
fzyzcjy
Inconsistent responses for the same case with different limit parameters
#2550
opened Dec 7, 2024 by
Starry-Liu1
Inquiry about the feature to continue evaluation after abnormal termination
asking questions
For asking for clarification / support on library usage.
#2548
opened Dec 6, 2024 by
minimi-kei
Previous Next
ProTip!
Type g i on any issue or pull request to go back to the issue listing page.