Spaces:
Runtime error
Runtime error
| TITLE = """<h1 align="center" id="space-title"> KG LLM Leaderboard</h1>""" | |
| INTRODUCTION_TEXT = f""" | |
| 🐨 KG LLM Leaderboard aims to track, rank, and evaluate the performance of released Large Language Models on traditional KBQA/KGQA datasets. | |
| The data on this page is sourced from a research paper. If you intend to use the data from this page, please remember to cite the following source: https://arxiv.org/abs/2303.07992 | |
| We compare the current SOTA traditional KBQA models (fine-tuned (FT) and zero-shot (ZS)), | |
| LLMs in the GPT family, and Other Non-GPT LLM. In QALD-9 and LC-quad2, the evaluation metric used is F1, while other datasets use Accuracy (Exact match). | |
| """ | |
| LLM_BENCHMARKS_TEXT = f""" | |
| ChatGPT is a powerful large language model (LLM) that covers knowledge resources such as Wikipedia and supports natural language question answering using its own knowledge. | |
| Therefore, there is growing interest in exploring whether ChatGPT can replace traditional knowledge-based question answering (KBQA) models. | |
| Although there have been some works analyzing the question answering performance of ChatGPT, there is still a lack of large-scale, comprehensive testing of various types of complex questions to analyze the limitations of the model. | |
| In this paper, we present a framework that follows the black-box testing specifications of CheckList proposed by Microsoft. | |
| We evaluate ChatGPT and its family of LLMs on eight real-world KB-based complex question answering datasets, which include six English datasets and two multilingual datasets. | |
| The total number of test cases is approximately 190,000. | |
| """ | |
| CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results" | |
| CITATION_BUTTON_TEXT = r""" | |
| @article{tan2023evaluation, | |
| title={Evaluation of ChatGPT as a question answering system for answering complex questions}, | |
| author={Tan, Yiming and Min, Dehai and Li, Yu and Li, Wenbo and Hu, Nan and Chen, Yongrui and Qi, Guilin}, | |
| journal={arXiv preprint arXiv:2303.07992}, | |
| year={2023} | |
| } | |
| """ |