The most powerful ability of large artificial intelligence models is definitely the most basic text processing function. However, researchers from a start-up company called PatronusAI found that even the most powerful large models currently cannot accurately analyze corporate financial filings from the U.S. Securities and Exchange Commission (SEC).
OpenAI's GPT-4-Turbo should be said to be the best-performing artificial intelligence model currently on the market. However, in PatronusAI's latest test, only 79% of the answers to SEC file questions were correct.
If ordinary artificial intelligence tools are asked to answer such questions, they will generally either be unable to answer, or they will experience "hallucinations", that is, they will make up numbers and facts that are not in the SEC documents.
Anand Kannappan, co-founder of PatronusAI, said: "Such performance is absolutely unacceptable, and its accuracy has to be much higher to truly start working in an automated and production-ready way."
The findings highlight some of the challenges facing AI models as large companies, especially in regulated industries like finance, seek to incorporate cutting-edge technology into their businesses, whether in customer service or data research.
Since the release of ChatGPT late last year, the ability to quickly extract important numbers and text and analyze financial statements has been regarded as one of the most promising applications of chatbots. SEC filings are filled with important data, and if AI can accurately summarize that data or quickly answer questions about its contents, it could give users an edge in the highly competitive financial industry.
Therefore, major investment banks and financial companies are making arrangements for this. Bloomberg, the world's largest financial information company, has released BloombergGPT, a large model specially built for the financial field. Business school professors have studied whether ChatGPT can analyze financial headlines. JPMorgan Chase is developing an artificial intelligence-driven automatic investment tool. A recent McKinsey forecast said generative AI could generate trillions of dollars in annual revenue for the banking industry.
But the entry of artificial intelligence into the financial industry is not smooth. When Microsoft first launched its Bing chatbot using OpenAI's big model, one of its prime examples was a quick summary of earnings press releases. Observers quickly realized that the numbers released by Microsoft were wrong and that some were completely fabricated.
The co-founder of PatronusAI points out that part of the challenge of incorporating large models into actual products is that they are non-deterministic — they are not guaranteed to produce the same output for the same input every time. This means companies need to conduct more rigorous testing to ensure they function correctly, stay on topic, and provide reliable results.
PatronusAI tested four large models: OpenAI’s GPT-4 and GPT-4-turbo, Anthropic’s Claude2, and Meta’s Llama2. After conducting relevant tests, the two co-founders of PatronusAI were surprised by the poor performance of the large model.
Rebecca Qian of PatronusAI noted: “It’s surprising how often large models refuse to answer questions, with very high rejection rates, even when the answers are in context, even for questions that an average person can answer.”
However, the company also believes that if artificial intelligence continues to advance, large models like GPT will have huge potential to help people in the financial industry - whether analysts or investors.
An OpenAI representative noted that the company's usage guidelines prohibit the use of OpenAI models to provide tailored financial advice without a qualified person reviewing the information, and requires anyone using OpenAI models in the financial industry to provide a disclaimer. OpenAI's usage policy also states that OpenAI's models are not fine-tuned to provide financial advice.