But how accurate is it?

Dawid Naude
July 12, 2024

Is it accurate? Along with “Is my data safe”, this is the most common question about AI. We’ve all experienced mild embraces with inaccuracy, possibly even hallucination of AI models.

So how accurate is AI, let’s get a bit more specific.

General AI Assistants and Narrow AI Applications

First let’s categorise them into 2 buckets - general AI Assistants (like ChatGPT, MS Copilot, Google Gemini), and narrow AI applications - like a custom tool that validates if a new contract has any terms not compliant with your standard policy. It’s been narrowly grounded on your own data and business rules.

For general AI Assistants, these are ‘roughly right’. They’re great at trends, summaries, advice. If I were to ask a group of sales executives what best practice sales process is, I’ll get different answers from each one. None of them are wrong, it’s just that the advice is at a level where it’s generalised, and they’re coming at it from their own viewpoint. You didn’t ask a precise question like “what’s the temperature right now”, you asked “how’s the weather”, and got answers like “It’s cool”, “pleasant”, “chilly but sunny”. All correct, but different.

Compare this to a question that requires a precise answer - “What is regulations am I required to comply with by selling in the EU”. This isn’t a general answer, you need a list of specific clauses as well as likely needing to give more specific answers to specific questions. Or “What are the visa application requirements for India”.

General AI Assistants like ChatGPT are pretty good at most things, as long as it doesn’t require precision. In the example of visa entry requirements in India, if you were to ask ChatGPT this right now, it gets it wrong. It suggests items that are no longer required. The nature of the training process is that it draws on petabytes of information, and part of that are blogs, archived information and there might even be an attachment on the government website somewhere that has the old information, vs what’s published on a page on the website. It draws on all of this information.

We can see new user experience patterns emerging in Google Gemini where it offers the option to “verify with google” and it’ll do a quick google search the validate the information. This would increase the accuracy by a bit but not completely. The more you can ground AI in deliberate and specific sources, the higher the level of precision.

The more precise an answer you need the more specific a solution you should seek, or create your own. Platforms like Redline, Josef Q and Harvey are specialist narrow applications for lawyers. Most of them still use GPT4 as an underlying model but the user experience and technology has been tweaked, fine tuned, grounded in a much more precise process.

Roughly Right

Where is a 20% difference in opinion from 2 people ok…

The things I would recommend people use ChatGPT, Copilot, Gemini without hesitation are:

  • Create a first draft of marketing material
  • Critiquing a contract (vs creating one)
  • Advising on ways to improve your sales process
  • Understanding themes and highlights in an annual report
  • Summarising a document
  • Summarising a big spreadsheet and distilling themes
  • Deeply understanding a topic you’re unfamiliar with
  • Recommending approaches to a problem
  • Simplifying complex documents
  • ‘Guidance’ instead of formal process
  • Simulating challenging interactions like sales negotiations “give me 15 difficult questions they’re likely to ask”
  • Asking for ways to improve something
  • Consolidating pros and cons and creating comparison tables

Things that more precision is necessary, and will require further review or a specific solution. AI can do the first draft, but rely on the output, it will have problems.

  • Complex financial modelling
  • Listing specific steps in order for a scientific process
  • Creating an HR policy that’s meant to meet legislation
  • Creating a call script for regulated industries
  • Creating a non-disclosure agreement
  • Responding to a customer complaint where they are liable for financial compensation by a regulator
  • Interpreting your specific obligations accordingly to law (80% accuracy here isn’t enough)
  • Calculating financials like “what is Amazon’s marketing revenue for the past 4 quarters”. Surprisingly it gets this wrong

There are solutions to all of the above starting from complex prompt engineering, however the best solution would be to choose or build a narrow AI solution for what you’re wanting to do.

There’s a ton of potential, start tinkering in the roughly right domain, and then identify the things where you need a specific solution, and where the business case would stack up, and find or build the right approach.

The good news is that there are 1000+ narrow AI solutions on websites like theresanaiforthat.com. Note, that the vast majority are honestly… very average. It’s a pretty user interface on some complex prompt engineering, without any clear grounding on a custom data set.

Most of these startups will die, so you’ll need to try a few before figuring out if a custom solution in the right approach. As with everything in AI, test and learn and trial and error is part of the process.

See you next week.

Dawid

Discover more articles

Let’s help you run a better business.

Book a consultation with us. It only takes a minute!