Search for a command to run...
The concept of software agents has been discussed for many decades with the vision that agents ‘inhabit some complex dynamic environment, sense and act autonomously in this environment, and by doing so realize a set of goals or tasks for which they are designed’, as defined by Pattie Maes in 1995. An illustrative example is the case of car navigation systems, which estimate a best route according to the preferences of the driver, adapt to traffic news with optimised routing and inform the user which way to go. Only recently, the tremendous development of large language models (LLMs) triggered a new wave of excitement about so-called ‘AI agents’, which are based on a LLM core enhanced with interfaces to tools and orchestration of software elements, which are programmed ex-ante. These AI agents combine two limitations: LLMs are statistical estimators for most probable ‘next best tokens’, and tools such as interfaces or orchestration have to be programmed traditionally, ie knowing which problem has to be solved. There are, however, narratives that they can transform businesses operations and increase productivity by making decisions without humans, optimising processes and adapting instantaneously to new situations. Following a brief review of the actual technical implementations, this paper scrutinises this issue from three perspectives: the so-called t-bench (‘tool–agent–user’, ie benchmark with retail, airline and telecom customer support systems), first tests with more sophisticated AI agents (such as Anthropic’s project ‘Vend’ in 2025), and real-world tests of agent-like LLM applications for financial services (such as extraction of environmental, social and governance [ESG] parameters from corporate reports). The paper concludes that the usability of AI agents depends on a quality–resource analysis for every individual use case: there is a difference between a macro-economic analysis of a set of ESG reports versus an individual decision for corporate ESG-linked lending. While tailored benchmarks for ‘fine-tuned’ LLMs to solve dedicated problems such as maths text questions are quite impressive, the current experiences with real-world cases do not support the vision that AI agents would rewrite the rules of business, but indicate a possible new Solow paradox. Originally, Robert Solow wrote in 1987: ‘You can see the computer age everywhere but in the productivity statistics.’ This paper understands this paradox as analysed by Daron Acemoğlu et al. in 2014: that IT-using industries show no additional productivity gains, in contrast to the view that IT is making workers redundant and ‘automates’ performance increase, despite the continuously increasing output of the IT-producing industry. Today, this paradox might reappear for AI-agent-using versus AI-agent-producing sectors of economy. This article is also included in The Business & Management Collection which can be accessed at https://hstalks.com/business/.
Published in: Journal of AI, robotics & workplace automation.
Volume 4, Issue 3, pp. 265-265
DOI: 10.69554/soiz7505