How this AI team at Canva helped creators pick the perfect keywords using a Language Model

Mar 10, 2023

Greetings 👋, this is Paradigm where we study stories on how fortunes were made with AI.

A lot more new readers this week! Welcome 🙏

With AI reaching ‘escape velocity’ in the last few weeks, I know we all can feel a bit of FOMO (fear of missing out) sometimes. But not to worry, delivering real value with anything takes time and patience. So let’s build smart, not fast!

That said;

In this story #3 we learn;

💡 How Canva tackled keyword suggestions for all the different languages on its graphic designing platform
🪄 Some unconventional tricks used during the fine-tuning of a language model
⚒️ The MLOps tools and framework used at Canva for AI deployments

On to the full story:

In this story, I’m in conversation with my friend Sachin Abeywardana, who is currently a Senior Machine Learning Engineer at Canva, which is the famous graphic design platform we all use.

Sachin has years of experience applying AI to business use cases and also holds a Ph.D. in Machine Learning from the University of Sydney.

Hello Sachin, tell us about this particular use case

So, at Canva we have external creators on the platform designing and publishing their work. One of the most important aspects of improving their discoverability is the effective use of meta-data and keywords on their posts.

We don’t want the users to do ‘keyword-stuffing’, which means excessive use of irrelevant keywords thinking that will improve their exposure. Also, we noticed some users lacked knowledge of what good keywords to use.

We decided to tackle these issues by using a Language Model (LM) that can suggest relevant keywords for a post.

What about the data?

We needed data on existing posts along with relevant keywords that were mapped to them.

Luckily we have a great data engineering team at Canva who took care of the data collection from the platform. All data was sourced internally.

Did you have to do any data cleaning or enhancements?

Not much. We ran a quick EDA to get an idea of the distribution of keywords in each language.

Additionally, we decided to use only keywords generated by our internal designers to avoid unwanted noise. The noise came in the form of stuffed keywords. This motivation does not exist for internal designers.

How did you first approach the project?

The go-to LM at the time was the Bidirectional Encoder Representations from Transformers (BERT). We realized it could serve our needs here.

So, the task was to generate the top N keywords given the user-generated content within a template post as input. Basically, it was a text classification task.

Our first attempt was to directly use BERT in both English and non-English spaces. But we soon came across a challenge with the latter.

Take us through your experimentation phase up to the final deliverable

So, one clear issue with tacking all non-English keywords together was the massive prediction space it could have.

Canva serves around 50 languages at the moment and the sum of all keywords in all those languages would be very large. This is not ideal for a multi-class classification task. Given the sparsity of keywords in non-English keywords, it was hard to justify posing this as a large multi-class classification problem.

Additionally, we observed that the accuracy of non-English keyword predictions was quite low. This was due to the significant imbalance of data between English and non-English examples.

Hence, we had to resort to another approach for the non-English space, where we encode (embed) both keywords and the input text via the LM and rank the keywords based on a similarity metric.

We had to come up with 3 main innovations here when fine-tuning the model:

Altering the batching process so that each batch would only contain one language. Furthermore, we attempted to make subsequent batches come from different languages. This was to ensure that the model would not get good at one language before moving on to the next. We found that this helped with convergence.
We used an Asymmetric loss function which penalizes the model more if it gets a positive class prediction wrong.
We did not fine-tune the word embeddings of the model. We identified that the pre-learned embeddings should not change. The loss only updated the weights of the transformers.

These tricks helped us to reach an acceptable accuracy given the limitation of quality non-English examples. You can read more about the method used in our Canva Tech Blog.

Amazing! Tell us details about the tech stack and system design

Languages - Python, Java (backend)
Frameworks - Pytorch, Pytorch Lightning
Computation - GPU (on AWS)
Serving - Uvicorn, Flask
Training Pipeline: Argo
Docker Build Tool: Bazel
Code Versioning - Github

System design

Canva has a Platform Team that handles all of the engineering aspects when it comes to deployment. Generally, all training tasks are run on Kubernetes with Argo used for orchestration. For model serving, AWS ECS is used.

A Machine Learning Engineer is supposed to produce the Jsonnet file with the necessary code for training and the service for serving the model.

How long did it take to build the whole thing?

It took 12 weeks from kick-off to delivering the solution.

What was the team structure for this project?

So, in Canva it is usually teams of 8-10 people. Each team has an Engineering lead, a Product Manager, a Machine Learning Engineer and a combination of backend and frontend developers.

Maybe the most important question is, what was the impact made?

We measured this by looking at the ‘acceptance rate’ of the suggested keywords by the users on an ongoing basis. According to the latest numbers, this was 70% for English and 50% for non-English users.

In my opinion, this is a solid first improvement from having no suggested keywords to now.

🏁 At this stage I asked the guest a few more questions in general. Here we go.

What are the areas you’re excited about in the current frontier of AI?

Obviously Generative AI at the moment. The phase in which it is growing is incredible.

Specifically, I have been looking at diffusion-based models. I would like to build a version from scratch in order to really understand the inner workings.

Any ideas or solutions you will pursue if you have time?

I am more of an execution guy rather than an idea guy. But because of some of my previous experiences, I liked building systems that used Reinforcement Learning for trading. Especifcailly for sports betting.

Finally, resource recommendations for learning AI?

I recommend Fast.ai as the best place to start with the fundamentals of AI. Both courses 1 and 2. Apart from that, I have found the HuggingFace tutorials to be pretty useful.

I might also plug my personal ML course on Udemy for anyone interested :)

That's it for this one. Thanks a lot, Sachin!

💬 Let’s discuss more AI application ideas and system design choices on the Paradigm Discord. Our guests are also there.

Join the chat

🔥 As a closing note, let me leave this here.

😂

See you in the next post!

Thank you for reading Paradigm. This post is public so feel free to share it.