Unlock New Heights in Employee Development

The New Performance Appraisal

AI Buyers Guide for HR: Unmasking the Limits of ChatGPT

AI Buyers Guide for HR: Unmasking the Limits of ChatGPT (#2 in Series)

AI Buyers Guide for HR: Unmasking the Limits of ChatGPT (#2 in Series)

by Frank P. Ginac, CTO at TalentGuard

Welcome to the next addition to my series, AI Buyers Guide for Human Resources (HR) Professionals. This is article number 2 in the series. My objective for this series is to arm HR professionals responsible for selecting, deploying, and managing AI-based HR Tech solutions in the enterprise with the knowledge they need to perform these tasks confidently. The information shared here is not just of value to HR professionals but also generally applies to any buyer of AI-based software. I hope you find the information helpful and welcome your feedback and comments. I would greatly appreciate it if you’d share this article and the others in the series with your network.

This article discusses the capabilities and limitations of Large Language Models (LLMs) like ChatGPT, highlighting that while they are powerful tools for certain tasks, they are not universal AI solutions. It addresses common misconceptions about LLMs, such as their use as a “source of truth,” and illustrates this with examples from legal and medical fields where reliance on LLMs led to inaccuracies. The article emphasizes the need for a balanced approach to AI application, combining LLMs’ strengths with human expertise and other AI techniques for effective and responsible use, particularly in fields like HR Tech. My objective for this series is to arm HR professionals responsible for selecting, deploying, and managing AI-based HR Tech solutions in the enterprise with the knowledge they need to confidently perform these tasks. The information shared here is not strictly of value to HR professionals and generally applies to any buyer of such technologies. I hope you find the information useful, and I welcome your feedback and comments. I would greatly appreciate it if you’d share this article and the others in the series with your network.

This article discusses the capabilities and limitations of Large Language Models (LLMs) like ChatGPT, highlighting that while they are powerful tools for certain tasks, they are not universal AI solutions. It addresses common misconceptions about LLMs, such as their use as a “source of truth,” and illustrates this with examples from legal and medical fields where reliance on LLMs led to inaccuracies. The article emphasizes the need for a balanced approach to AI application, combining LLMs’ strengths with human expertise and other AI techniques for effective and responsible use, particularly in fields like HR Tech.

ChatGPT continues to dominate AI news. With hundreds of millions of users by the last count, it seems that everyone has jumped on the LLM bandwagon. The media is painting it as a universal AI that can be applied to solve just about any problem it’s tasked with. While it is impressive, and I expect it will continue to improve with users finding more and more creative ways to apply it, it can’t solve (and never will) every kind of task that the vast array of AI/ML approaches and algorithms are capable of solving today.

Of deep concern to practitioners and researchers, is the misconception of its capabilities and its use as a “source of truth.” Consider the lawyers who used ChatGPT to write a legal brief that included six fabricated citations. After being sanctioned by the court, the law firm issued the statement, “We made a good faith mistake in failing to believe that a piece of technology could be making up cases out of whole cloth.”[1]

More alarming are concerns about the use of LLMs in medical diagnoses, the creation of treatment plans, and the like. Consider a recent research paper published by the Journal of Medical Internet Research that evaluated the accuracy of ChatGPT’s diagnoses and found that “ChatGPT achieved an overall accuracy of 71.7% across 36 clinical vignettes.“[2]

Deep Learning Neural Networks and LLMs perform very well at certain tasks, and in some cases, much better than other AI approaches. But, they have their limits as we’ve seen in the cases above. Why are they good at some tasks and not so good at others, and why they are not a universal AI solution? Let’s start with a definition: LLM stands for “Large Language Model” which in AI parlance is a machine learning model, a model trained with data, designed to complete a sequence. Tasked with completing the sequence “Jack and Jill went…”, a well-trained LLM will respond with “…up the hill to fetch a pail of water.” They can be trained to complete any sequence from the domains of natural languages to programming languages to symbolic languages and more. In the broadest sense, a language is any system of symbols, letters, numerals, and rules used to transmit information. Think about the above examples of applications of LLMs gone wrong. Does it now make sense why LLMs are not well suited for all tasks? To be fair, ChatGPT is much more than an LLM. It is an LLM that has undergone further fine-tuning to learn how to follow instructions. However, at its core, it is still an LLM with all of the limitations of LLMs.

In my world, HR Tech, I spoke to an analyst recently (a very senior analyst who covers the HR Tech industry) who posited that the only AI of value is “big data” AI, that is, Deep Learning Neural Networks and LLMs that require vast amounts of data to train, data that is only readily available to a small handful of very large companies, with access to enormous computing resources. I hear more and more VCs, analysts, CEOs, and other business decision-makers making the same ill-informed claim. When it comes to deciding which approach or algorithm to use to solve a particular problem or even whether or not AI should be applied at all, I turn to the following advice from experts in the field:

“Any problem that can be solved by your in-house expert in a 10–30 minute telephone call can be developed as an expert system [classic AI approaches/algorithms].”[3]

“If a typical person can do a mental task with less than one second of thought, we can probably automate it using AI [supervised machine learning, specifically, function approximation] either now or soon… [Such] tasks are ripe for automation. However, they often fit into a larger context or business process; figuring out these linkages to the rest of your business is also important.”[4]

Ginac’s corollary to Andrew Ng’s “one second of thought” proposition is that anything that takes longer than a few seconds of thought is unlikely to be automated with supervised machine learning approaches, including Neural Networks/Deep Learning approaches, at least, not with today’s approaches.

When thinking about ways to apply AI/ML at TalentGuard, these two rules weigh heavily on what we choose to automate and how we automate it. Recently, we embarked on a project to automate updating thousands of out-of-date learning references in one of our data sets. For example, given a learning reference in the form of a book with an author, description, publisher, date of publication, and ISBN that was published more than 5 years ago, is there a recently published book that covers the same material?

As one of my former AI professors at Georgia Tech is fond of saying, “Do the easy thing first!” The easy thing to do, at least to try, was to see how OpenAI’s gpt-4 model would do given the task of “generating” a bit over 3,000 book references. Iterating over the dataset with a carefully crafted prompt, the model faithfully generated a book title, a description, a publisher, a date of publication within the past 5 years, and even an ISBN for each one! Given our understanding of LLMs we, of course, did not blindly trust the results and proceeded to verify each reference generated by the model. It turns out that only 10% of the books it generated were real books.

This phenomenon is known as model “hallucination” and it is precisely why LLMs should not be used as a source of truth even for tasks for which they perform quite well. Does that mean that LLMs are completely useless for this particular task? Not entirely, it turns out. The gpt-4 model does a reasonably good job of inference. For example, given the summary of a book, it can identify subjects that the book is likely to cover. Now, with a title, description, and a list of subjects, we can “ground” the model with data from source-of-truth databases, e.g., the Library of Congress catalog or Google Books, and apply other non-LLM AI techniques to confirm the similarity of the original to the updated reference. With this approach, we were able to update over 90% of our out-of-date references with automation.

The journey of applying AI and LLMs like ChatGPT in the HR Tech landscape at TalentGuard underscores a critical lesson: the effective use of AI requires a blend of technological understanding and practical wisdom. LLMs, despite their advanced capabilities and versatility, are not infallible or universally applicable tools. They excel in generating and inferring information within their trained domain but fall short when it comes to discerning fact from fiction, or in tasks that require deep, contextual understanding or ethical judgment. This limitation, often manifested as model hallucination, necessitates a cautious and informed approach in their application.

The key lies in leveraging AI as a tool to augment human intelligence and expertise, rather than as a standalone solution. By combining the inference abilities of LLMs with data verification from reliable sources and complementary AI technologies, we can harness the power of these models more effectively and responsibly. It’s about finding the right balance: utilizing the strengths of AI to enhance our capabilities while being acutely aware of its limitations and ensuring that its application is grounded in reality. As we continue to explore the frontiers of AI in various fields, this mindful approach will be crucial in unlocking the true potential of these technologies in a way that is both innovative and ethically sound.

[1] Reuters. (2023, June 22). New York Lawyers Sanctioned for Using Fake ChatGPT Cases in Legal Brief. Retrieved from https://www.reuters.com/legal/new-york-lawyers-sanctioned-using-fake-chatgpt-cases-legal-brief-2023-06-22/

[2] Rao1, A., Pang1, M., Kim1, J., Kamineni1, M., Lie1, W., Prasad1, A. K., Landman2, A., Dreyer2, K., Succi1, M. D., 1Medically Engineered Solutions in Healthcare Incubator, & Succi, C. A. D. (n.d.). Assessing the utility of chatgpt throughout the entire clinical workflow: Development and Usability Study. Journal of Medical Internet Research. https://www.jmir.org/2023/1/e48659/

[3] Firebough, M. W. (1989). Artificial Intelligence.

[4] NG, A. (2017, September 21). Andrew Ng: What AI Can and Can’t Do. Harvard Business Review. https://hbr.org/2016/11/what-artificial-intelligence-can-and-cant-do-right-now

Continue to read the series:

Unlocking Employee Potential
The Power of Skill-Based Learning

The power of skill-based learning in corporate America is paramount. Businesses prioritizing employee development are better equipped to adapt to changes, boost productivity, and maintain a competitive edge. One of the most effective strategies for achieving this is through skill-based learning. Skill-based learning focuses on enhancing specific competencies, whether technical or soft skills, to improve […]

TalentGuard Talent Insights
Pioneering Data-Driven Talent Strategies

Pioneering Data-Driven Talent Strategies Organizations face a daunting challenge—the widening skills gap. This issue threatens their ability to innovate, stay competitive, and achieve strategic objectives. Traditional talent development methods are no longer sufficient. In this blog post, we delve into the white paper “Elevating CEO Insights: Pioneering Data-Driven Talent Strategies” by Linda M. Ginac, the […]

Refining Skills Assessment
Skill Assessment and Verification for Growth

Skill Assessment and Verification for Growth Redefining skill assessment for continuous growth enhances employee development and drives organizational success. Businesses are facing a profound challenge: traditional skill assessment and verification methods are no longer sufficient in today’s dynamic workforce. This realization has led organizations to seek innovative solutions to assess, verify, and showcase skills accurately […]