TEJAS Journal of Technologies and Humanitarian Science

ISSN : 2583-5599

Open Access | Quarterly | Peer Reviewed Journal

July, 2024 | Volume 03 | Issue 03


From Narrow to General: Rethinking Benchmarking for Artificial General Intelligence


Ansh Tiwari
Student Scholar, Computer Science, National P.G. College, Lucknow

Author

Ashmit Dubey
Student Scholar, Computer Science, National P.G. College, Lucknow

Author

Mahesh Kumar Tiwari
Assistant Professor, Computer Science, National P.G. College, Lucknow

Author

Rinku Raheja
Assistant Professor, Computer Science, National P.G. College, Lucknow

Author


📌 DOI: https://doi.org/10.63920/tjths.33005

🔑 Keywords: Artificial General Intelligence, Cognitive Skills, Turing test, Evaluation

📅 Publication Date: 20 July, 2024

📜 License:

  • Share — Copy and Redistribute the material
  • Adapt — Remix, Transform, and build upon the material
  • The licensor cannot revoke these freedoms as long as you follow the license terms.

Abstract:

Artificial General Intelligence (AGI) represents the ultimate goal of AI research: in order to develop systems that possess cognitive characteristics similar to those of the human brain and can effectively solve problems in different fields without the need of being trained for a specific type of work. But there are numerous issues that attend the pursuit of AGI, one of which is that benchmark that are adequate comprehensive measures to track the progress towards this goal are absent. The current kinds of evaluation, which have received most of their evolution within the establishment of narrow AI, do not perform adequately when applied to AGI. The focus of this research paper is a historical perspective of the significance of AI assessment with focus on the modern techniques which aimed at proving the AI’s ability to think more widely. It discusses some of the inherent flaws present in existing benchmarks like biased sampling, overfitting and the so called “AI effect” in which solved tasks no longer measure intelligence. Looking at these challenges, this paper seeks to present the criteria towards formation of better benchmarks for the complexity and the range of capabilities needed in AGI

Download Full PDF Paper


References

[1] Akpan, M. "Have We Reached AGI? Comparing ChatGPT, Claude, and Gemini to Human Literacy and Education Benchmarks.”
[2] Schaul, T., Togelius, J., & Schmidhuber, J. "Measuring Intelligence through Games."
[3] Chollet, F. "On the Measure of Intelligence."
[4] HernĂĄndez-Orallo, J. "AI Evaluation: Past, Present, and Future."
[5] Bubek, S. "Sparks of Artificial General Intelligence: Early Experiments with GPT-4."
[6] Mueller, M. "The Myth of AGI: How the Illusion of Artificial General Intelligence Distorts and Distracts Digital Governance."
[7] Yampolskiy, R. V. "Turing Test as a Defining Feature of AI-Completeness."
[8] Stanford HAI. "AI Benchmarks Hit Saturation." Available at: https://hai.stanford.edu/news/aibenchmarks- hit-saturation
[9] Raji, I. D. "AI and the Everything in the Whole Wide World Benchmark."
[10] Yang, S., Chiang, W. L., Zheng, L., Gonzalez, J. E., & Stoica, I. "Rethinking Benchmark and Contamination for Language Models with Rephrased Samples."
[11] Sottana, A., Liang, B., Zou, K., Yuan, Z. "Evaluation Metrics in the Era of GPT-4: Reliably Evaluating Large Language Models on Sequence-to-Sequence Tasks."
[12] Mohd Shariq Ansari, A review of DC Micro-grid control Approaches, TEJAS Journal of Technologies and Humanitarian Science, ISSN-2583-5599, Vol.02, I.01(2023)
[13] Sushmita Goswami, Deepak Kumar Chaubey, The Impact of social media on the Spread of Fake News and the Role of Machine Learning in Detection, TEJAS Journal of Technologies and Humanitarian Science, ISSN-2583-5599, Vol.02, I.01(2023)
[14] Dahl, M., Magesh, V., Suzgun, M., & Ho, D. E. "Large Legal Fictions: Profiling Legal Hallucinations in Large Language Models." [15] Fei, N., Lu, Z., Gao, Y., Yang, G., Huo, Y., Wen, J., Xin, R., Gao, T., Xiang, H., Sun, H., & Wen, J. R. "Towards Artificial General Intelligence via a Multimodal Foundation Model."
[16] Pieter Abbeel and Andrew Y. Ng, ‘Apprenticeship learning via inverse reinforcement learning’, in Proceedings of the twenty-first international conference on Machine learning, p. 1. ACM, (2004).
[17] Tarek Besold, Jose Hern ´ andez-Orallo, and Ute Schmid, ‘Can Machine Intelligence be Measured in the Same Way as Human intelligence?’, KI- Kunstliche Intelligenz ¨, 1–7, (April 2015).