How This AI Tech Company Boosted Engagement on Text Messages Using a Language Model
Greetings 👋, this is Paradigm where we study real stories of how AI was used and fortunes were made!
In this story #1 we learn;
💡 How a language model (BERT) was used to classify user text messages
⚒️ A practical tip to overcome the lack of training data
📢 Bonus - A hint from our guest on how to stay on top of the newest discoveries in AI
On to the full story:
In this story, I’m in conversation with my friend CD Athuraliya, who is the Co-founder and Lead ML Engineer at ConscientAI, an engineering and research firm for all things AI and ML.
He is one of the most knowledgeable persons I know in the area who has years of practical experience building AI applications.
Hello CD, tell us about this use case
This was basically a sentence classification task on SMS text messages for one of our clients. They had already built an automated messaging platform, but they wanted to scale it with the capabilities of personalizing messages for each user. This was to ultimately increase the engagement of the users with the platform.
How did you first approach the project?
From the start, we explored different Natural Language Processing (NLP) models and techniques that might fit the solution. This was back in 2018 and right around the kick-off of this project, the Bidirectional Encoder Representations from Transformers (BERT) was released by Google. Soon we realized the superior capabilities of this model and started experimenting with it.
What about the data?
Our client already had a dataset collected from the existing platform.
Did you have to do any data cleaning or enhancements?
Of course, we tediously went through the data and observed the nuances of the language used in past conversations by the users. This gave us an understanding of the specific text that the model might find harder to process, such as the use of sarcasm or satire.
Also, the tagging of sentences was done by an external team and we synced with them from time to time to make sure that the labeling was done the way we wanted.
Take us through your experimentation phase up to the final deliverable
Sure. We started with the pre-trained BERT model as our base and wanted to fine-tune the top layers with our own data. BERT had really good documentation and extensions even at that time, but we had to implement an additional data processor for SST-2 corpus and we used a custom version of this for our use.
We also had to build helper classes for online predictions and model chaining since that was not available with the original implementation.
With these components, we were able to test out our initial solution.
But soon we realized that there was an imbalance in our training data. Since it was a custom-built dataset, some of the classes had a very low number of data points compared to others.
The way we approached this challenge was interesting. We managed to implement a chain of similar models performing binary classification on ‘merged classes’.
This allowed us to combine data points and have more balanced classes during the training process of each model.
This model-chaining technique made it to the final solution that we took to production.
Amazing! Tell us details about the tech stack and system design
Languages - Python
Frameworks - Tensorflow
Compute - Local GPU for training
Serving - Bespoke script
Infrastructure - AWS cloud
Version Control - GitHub
System design
We had to manage the inputs and outputs of the model in queues. This made sure that none of the messages were missed. We didn’t expose the model with endpoints in this case but rather built a bespoke script to consume records from a queue, process it and produce the predictions to another queue.
These queues were maintained in RabbitMQ.
How did you monitor the solution after deployment?
Since our client had an existing rule-based system to find responses for user messages, we initially deployed the AI agent in parallel with the existing system.
Next, we carefully monitored the responses produced by both systems and flagged the ones that did not tally with each other. Thus we picked out the predictions that might not have made sense and looked into how those scenarios could be mitigated.
How long did it take to build the whole thing?
It took about 4-6 months. And we managed to go into production in that time period.
What was the team structure for this project?
It was mainly two ML engineers with a project management role.
Maybe the most important question is, what was the impact?
Capturing the final business KPIs was on the client’s side but I guess the impact we made was delivering a system that was scalable and robust.
As the client agreed, it would have been very challenging for them to scale it further without an AI-based approach.
🏁 At this stage, I asked the guest a few more questions in general. Here we go.
What are the areas you’re excited about in the current frontier of AI?
If I were to say it concisely, it’s AI Alignment.
Asking the question ‘Why should we have this AI system?’ rather than the ‘what’ and the ‘how’. A better understanding of ‘why’ will ensure the final system aligns with our (human) intentions.
Any ideas or solutions you will pursue if you have time?
I don’t want to give away too much, but I’m currently interested in multi-document summarization.
I believe it has some really good use cases.
Finally, any resource recommendations for learning AI?
One word - Twitter.
For me, Twitter has been the best place to follow the latest news/resources in AI. If you follow the right people/accounts there, you’ll get to know a lot. An example is Andrej Karpathy.
That's it for this one.
Let’s discuss more AI application ideas and system design choices on the Paradigm Discord. Our guests are also there.