Apply for community grant: Academic project (gpu)
#1
by
Blanca
- opened
We are building a leaderboard to assess the capacity of LLMs to generate Critical Questions. You can find the details on this task here: Critical Questions Generation: Motivation and Challenges
The benefit of this leaderboard is to avoid sharing the test set publicly, which could lead to data contamination. The validation data and all the resources we used to build it are already public: https://github.com/hitz-zentroa/Benchmarking_CQs-Gen
The GPU is required because the best way to evaluate this task is using an LLM-as-a-Judge of 9B parameters. See this in the paper "Benchmarking Critical Questions Generation: A Challenging Reasoning Task for Large Language Models" https://arxiv.org/abs/2505.11341