3. Assess precision vs recall — Design reward function I will try to skip the statistical jargon to keep this explanation as simple as possible. But precision and recall are statistical terms that measure the relevancy of results returned by an algorithm.
When designing for AI, the model will have to be tuned for precision or recall. This will define the accuracy of the model. This process is called
designing the reward function and should be a collaborative process (again) taken over by UX, Product, and Engineering. The decisions made in this step will be key to the AI deployment success and will dramatically affect the final experience for your users.
Before diving into the concept of precision and recall, let me recap for you what in statistic
Type I (false positive) and Type II (false negative) errors are.
Imagine you have an AI service to run cancer diagnosis. The AI model will predict whether or not a person has cancer. These kinds of models are called "binary classifiers". I will use them as a simple example for understanding how algorithms can be right or wrong.
When
binary classifiers make predictions (have or haven't cancer), there are only four possible outcomes:
- True positives. When the model correctly predicts a positive outcome. A person has cancer and the AI predicts effectively the person has cancer.
- True negatives. When the model correctly predicts a negative outcome. A person hasn't cancer and the AI predicts effectively that the person is free of cancer.
- False positives. When the model incorrectly predicts a positive outcome. A person hasn't cancer but the AI predicts that the person has cancer.
- False negatives. When the model incorrectly predicts a negative outcome. A person has cancer but the AI wrongly predicts that the person hasn't cancer
In this case, what worse for the user? Be wrongly predicted with cancer but been free of it. Or having cancer and the system wrongly predicts that the person is free from cancer? This duality will be the difference of tuning the system for precision or for recall. So, trade-offs must be involved in this process.
Now that you are clear about what is Type I and Type II errors, let us dive into the concept of precision and recall.
- Optimizing for Precision means that the AI model will use only the precisely correct answers, but it will miss some questionable positive cases (people how to have cancer and are detected as free of cancer). The higher the precision, the more confident you can be that any model output is correct. However, the tradeoff is that you will increase the number of false negatives by excluding possibly relevant results. It will show only cancer people, but it will miss some diagnosis. It won't find all the correct answers, only the clear cases.
- Optimizing for Recall means that the AI model will use all the right answers it finds, even if it displays a few wrong answers (people how to haven't cancer and are detected as cancer). The higher the recall, the more confident you can be that all the relevant results are included somewhere in the output. However, the tradeoff is that you will increase the number of false positives by including possibly irrelevant results. It will show cancer people, but also a few wrongly diagnosis of cancers. It will find all the correct answers and a few more wrong diagnosis.