June 10 2024

Generate synthetic data for intent classification

Generate synthetic data for intent classification

Generate synthetic data for intent classification

Generate synthetic data for intent classification

After watching Apple's WWDC 2024, you might find yourself asking this question:


How did Apple fine-tune their 3B parameter model to perform intent classification?


To be honest we don't know, but we can come up with a few ideas. They definitely didn't use prod data from Siri (that would be against their privacy stance). They definitely didn't hire workers to write intents (too slow and expensive).


They could have generated the data synthetically using large models like GPT4. The following is a a prompt taken from the "LLM2LLM" (link) paper that generates a user sentence that maps to one of the following intents: AddToPlaylist, BookRestaurant, GetWeather, PlayMusic, RateBook, SearchCreativeWork, SearchScreeningEvent.


Here's the prompt:





Running this in ChatGPT gives the following




Those results look great and resemble what a user might ask a voice assistant like Siri. Running this at scale can yield thousands of datapoints, enough to fine-tune a small model on. It's also dirt cheap, and pretty much instant relative to hiring a workforce to generate a transcript and intent.


On MonteloAI


With MonteloAI, you can generate synthetic data for intent classification in a few minutes.


Step 1: Come up with the format for your fine-tuning dataset


In this example, the format will look like:



We're going to generate many sentences, which we'll then label with the correct intent.


Step 2: Configure the sentence variable


Then we'll configure how to generate each sentence using the prompt from above.



Step 3: Generate


That's it! Finally configure the number of samples, the train/test split ratio, and the dataset name, and we'll start generating the dataset for you.


Result


For each sample, Montelo will generate a sentence, then feed it into the initial prompt for classification. A sample datapoint from the final dataset would look something like:



You can then download the dataset or fine-tune it on our platform. Enjoy!

Montelo is All You Need.

Start shipping today.

Montelo is All You Need.

Start shipping today.

Montelo is All You Need.

Start shipping today.

Montelo is All You Need.

Start shipping today.

© Copyright 2024, All Rights Reserved by MonteloAI

© Copyright 2024, All Rights Reserved by MonteloAI

© Copyright 2024, All Rights Reserved by MonteloAI

© Copyright 2024, All Rights Reserved by MonteloAI