A new study has examined the ability of artificial intelligence models to analyze filings from the Securities and Exchange Commission, CNBC reports.
The research, conducted by startup Patronus AI, found shortcomings in large language modes (LLMs) similar to the one utilized by ChatGPT. It found that even the best-performing AI model, OpenAI’s GPT-4-Turbo, only answered 79% of questions accurately when the entire filing was provided alongside the question.
The study was said to be an indication of the challenges that still lie ahead for the financial industry and other major companies as they attempt to use AI models to enhance their operations.
“That type of performance rate is just absolutely unacceptable,” Patronus AI co-founder Anand Kannappan was quoted as saying. “It has to be much much higher for it to really work in an automated and production-ready way.”
Among the areas of concern highlighted by the researchers was a lack of reliability by the AI models, who often refused to answer questions or provided inaccurate information not present in the SEC filings.
Financial firms have been increasingly incorporating the use of artificial intelligence into their practices to maximize its positive impact. Forbes has previously reported that 85% of financial institutions currently use artificial intelligence in some capacity, including for detecting fraud and predicting cash flow events. Bloomberg LP created its own AI model for financial data and JPMorgan is developing an AI-powered automated investing tool.
But the ability of the AI models to derive accurate data was called into question by the Patronus AI study, which found that one drawback was the “nondeterministic” nature of the LLMs. This means that they do not ensure consistent output for identical outputs, thus requiring rigorous testing by companies to make sure they are receiving reliable results.
As part of the study, Patronus AI developed a set of more than 10,000 questions and answers drawn from SEC filings from major publicly traded companies. Some questions required light math or reasoning, and while some AI models performed well others struggled to provide accurate answers. The researchers said the test was designed to produce a minimum performance standard for use of AI in the financial sector.
“There just is no margin for error that’s acceptable, because, especially in regulated industries, even if the model gets the answer wrong 1 out of 20 times, that’s still not high enough accuracy,” Patronus AI co-founder Rebecca Qian.
She noted one surprising finding was how often the models refused to answer the question posed, “even when the answer is within the context and a human would be able to answer it.”
Despite the deficiencies that were revealed, the Patronus AI co-founders said that the future of AI’s utilization in the financial industry still appears promising.
“Models will continue to get better over time,” said Kannappan. We’re very hopeful that in the long term, a lot of this can be automated. But today, you will definitely need to have at least a human in the loop to help support and guide whatever workflow you have.”
The attorneys at Lewitas Hyman include former senior attorneys at the SEC whose legal experience and industry knowledge make them uniquely qualified to provide counsel on securities regulatory, compliance and enforcement matters. Our attorneys fully understand the regulatory scrutiny financial professionals and their firms face from the various regulators that oversee the financial services industry. If your firm is facing an investigation from a regulatory agency, please contact Lewitas Hyman at (888) 655-6002 or through our online contact form.