Call for Papers
Quick Links
July, 2024 | Volume 03 | Issue 03
From Narrow to General: Rethinking Benchmarking for Artificial General Intelligence
Ansh Tiwari
Student Scholar, Computer Science, National P.G. College, Lucknow
Author
Ashmit Dubey
Student Scholar, Computer Science, National P.G. College, Lucknow
Author
Mahesh Kumar Tiwari
Assistant Professor, Computer Science, National P.G. College, Lucknow
Author
Rinku Raheja
Assistant Professor, Computer Science, National P.G. College, Lucknow
Author
đ DOI: https://doi.org/10.63920/tjths.33005
đ Keywords: Artificial General Intelligence, Cognitive Skills, Turing test, Evaluation
đ Publication Date: 20 July, 2024
đ License:
This work is licensed under a Creative Commons Attribution 4.0 International License
- Share â Copy and Redistribute the material
- Adapt â Remix, Transform, and build upon the material
- The licensor cannot revoke these freedoms as long as you follow the license terms.
Abstract:
Artificial General Intelligence (AGI) represents the ultimate goal of AI research: in order to develop systems that possess cognitive characteristics similar to those of the human brain and can effectively solve problems in different fields without the need of being trained for a specific type of work. But there are numerous issues that attend the pursuit of AGI, one of which is that benchmark that are adequate comprehensive measures to track the progress towards this goal are absent. The current kinds of evaluation, which have received most of their evolution within the establishment of narrow AI, do not perform adequately when applied to AGI. The focus of this research paper is a historical perspective of the significance of AI assessment with focus on the modern techniques which aimed at proving the AIâs ability to think more widely. It discusses some of the inherent flaws present in existing benchmarks like biased sampling, overfitting and the so called âAI effectâ in which solved tasks no longer measure intelligence. Looking at these challenges, this paper seeks to present the criteria towards formation of better benchmarks for the complexity and the range of capabilities needed in AGI
Download Full PDF Paper
References
[1] Akpan, M. "Have We Reached AGI? Comparing ChatGPT, Claude, and Gemini to Human Literacy and
Education Benchmarks.â
[2] Schaul, T., Togelius, J., & Schmidhuber, J. "Measuring Intelligence through Games."
[3] Chollet, F. "On the Measure of Intelligence."
[4] HernĂĄndez-Orallo, J. "AI Evaluation: Past, Present, and Future."
[5] Bubek, S. "Sparks of Artificial General Intelligence: Early Experiments with GPT-4."
[6] Mueller, M. "The Myth of AGI: How the Illusion of Artificial General Intelligence Distorts and Distracts
Digital Governance."
[7] Yampolskiy, R. V. "Turing Test as a Defining Feature of AI-Completeness."
[8] Stanford HAI. "AI Benchmarks Hit Saturation." Available at: https://hai.stanford.edu/news/aibenchmarks- hit-saturation
[9] Raji, I. D. "AI and the Everything in the Whole Wide World Benchmark."
[10] Yang, S., Chiang, W. L., Zheng, L., Gonzalez, J. E., & Stoica, I. "Rethinking Benchmark and Contamination for Language Models with Rephrased Samples."
[11] Sottana, A., Liang, B., Zou, K., Yuan, Z. "Evaluation Metrics in the Era of GPT-4: Reliably Evaluating Large Language Models on Sequence-to-Sequence Tasks."
[12] Mohd Shariq Ansari, A review of DC Micro-grid control Approaches, TEJAS Journal of Technologies and Humanitarian Science, ISSN-2583-5599, Vol.02, I.01(2023)
[13] Sushmita Goswami, Deepak Kumar Chaubey, The Impact of social media on the Spread of Fake News and the Role of Machine Learning in Detection, TEJAS Journal of Technologies and Humanitarian Science, ISSN-2583-5599, Vol.02, I.01(2023)
[14] Dahl, M., Magesh, V., Suzgun, M., & Ho, D. E. "Large Legal Fictions: Profiling Legal Hallucinations in Large Language Models."
[15] Fei, N., Lu, Z., Gao, Y., Yang, G., Huo, Y., Wen, J., Xin, R., Gao, T., Xiang, H., Sun, H., & Wen, J. R. "Towards Artificial General Intelligence via a Multimodal Foundation Model."
[16] Pieter Abbeel and Andrew Y. Ng, âApprenticeship learning via inverse reinforcement learningâ, in Proceedings of the twenty-first international conference on Machine learning, p. 1. ACM, (2004).
[17] Tarek Besold, Jose Hern ´ andez-Orallo, and Ute Schmid, âCan Machine Intelligence be Measured in the Same Way as Human intelligence?â, KI- Kunstliche Intelligenz ¨, 1â7, (April 2015).
