Understanding synthetic respondents in market research
Imagine: Your team has just leveraged a large language model (LLM) to develop some exciting new snack concepts. But now you’re faced with a new challenge: Which ideas are worth pursuing, which need a few tweaks, and which should be discarded?
For most innovators, the next step is culling the long list of ideas. This is sometimes achieved by using focus groups, but more often by simply selecting the team favorite(s) before moving on to consumer testing—kicking off your innovation cycle with a game of chance. Thankfully, the latest Generative AI (GenAI) technology—which has already demonstrated great promise for product innovators—now has an answer for that, too.
Synthetic respondents are artificial personas generated by machine learning models to mimic human responses in market research. They can represent target markets, specific demographics, or even consumption profiles. Informed by a diverse set of data sources, these “stand-in consumers” can generate feedback on any number of questions, but for product innovation, their most useful application (for now) is to quickly evaluate and optimize new concepts. This early-stage check holds potential to sort through ideas and accelerate innovation cycles while conserving time and resources for research questions that require real human consumer feedback.
But leveraging this technology within an effective solution takes more than attaching a shiny interface to an LLM. The overnight rush of research vendors to release non-optimized solutions with bold claims about their capabilities has led many—ourselves included—to raise an eyebrow. At the heart of the issue is the ability of current LLMs to excel at producing results that seem convincing. Even when these fast-to-market models lack access to the right tools or data for accuracy, they can still generate outputs that pass a gut check. Producing convincing answers is different from providing accurate ones—especially when it comes to making business decisions that rely on data integrity. This discrepancy is also why you might still hesitate to replace your lawyer with ChatGPT.
If you’re intrigued by the promise of GenAI but are evaluating its capabilities with caution, you might be wondering whether synthetic models are a realistic proposition at all. Can a synthetic persona truly convey human consumers’ preferences and opinions? Are there characteristics of best-in-class tools that set some apart from others? How can we ensure these solutions deliver both accuracy and actual business value? With these questions top of mind, we set out to find answers through our own rigorous experimentation.
Building the ultimate synthetic model
Our experimentation in building synthetic models models began years ago, utilizing our in-market transactional data for volumetric forecasting and trend detection. With more than 20 years of experience building patented AI and machine learning solutions—not to mention our vast stores of consumer-panelist data—we’ve always been ideally positioned to bring such a tool to market. But even as the recent emergence of GenAI has propelled new possibilities forward, our experimental approach has remained a deliberate one, prioritizing client value over hype. Above all, we’ve been laser-focused on ensuring output accuracy through testing and validation and are unwilling to compromise quality over speed to market. This strategy has been affirmed time and again, not only through our findings but also in conversations with clients who have noted that the early-market technology they adopted did not deliver as promised.
Although there are numerous complexities that go into building the ultimate synthetic respondent solution, we’ve identified three simple and non-proprietary principles that distinguish best-in-class models from their counterparts. It should come as no surprise that each of these begins and ends with real, human-provided consumer data..
Best-in-class synthetic models should test, calibrate, and validate response accuracy across every category
“When it comes to leveraging Generative AI in market research, there are no shortcuts for protecting data quality. As trusted leaders in this industry, our clients expect us to set the benchmark for what constitutes ‘quality,’ and they have been eager participants on this journey.”
— Mark Flynn, SVP Product Leadership, NIQ BASES
The dream of every product development team is to “lift and shift”—that is, build scalable mechanisms that drive effective results across every category, market, and circumstance. In fact, this touted simplicity-of-use is often what makes many of the early-market synthetic models so attractive: Who doesn’t want the ability to type in a few details about consumer demographics and preferences and generate concept feedback for skin care, cleaning solutions, and dog food?
But can such an approach create output that accurately reflects human decision-making?
We know that consumers are predictably irrational when it comes to decision making, and that value drivers are often shaped by context. What a consumer wants in cereal packaging will be different from their preferences for skin care; their price sensitivity for teeth whitening products will be different than that for chewing gum. Synthetic personas similarly demonstrate varied preferences among categories but in different ways from humans. Model biases can also show up in different roles across categories—for example, in our experimentation, we observed that the synthetic respondents seemed to care more about human health than the actual humans did.
Controlling for these factors requires continual adjustment of methodology, instructions, and even prompting order to arrive at truly authentic feedback. Learning as we go, we’ve adapted our testing procedures accordingly and, more importantly, validated that output against feedback from actual human consumers, leveraging our vast experience calibrating consumer-panelist scores to in-market sales from retail sales data.
Importantly, we’ve also worked to ensure our methodology prioritizes the protection of our clients’ intellectual property—meaning their proprietary content or data will never be used to train models. While our process might seem tedious to some, we’ve found there are no shortcuts for protecting data quality or our clients’ competitive information. These continual refinements and the ability to leverage our data assets for input and validation give us confidence in the accuracy and utility of our output, resulting in the business outcomes our clients have come to expect from us.
Best-in-class synthetic models should leverage the latest granular data to drive accuracy
“Consumer preferences are never static. They are continually shaped, reshaped, or spun on their head by dynamic factors like the economy, world events, or even the latest TikTok trend. Incorporating recent and relevant preferences is critical for predicting consumer acceptance, whether you’re working with human or synthetic data.”
—Ramon Melgarejo, President, Strategic Analytics and Insights
Think back to Spring 2020. Up to then, did you need multi-packs of disinfecting wipes? Did you ever strategize about your toilet paper supply beyond expecting house guests or finding your favorite brand on sale? The pandemic is perhaps a dramatic but relevant example of how quickly consumer behavior can shift. Even subtle movement in trends over time can lead to feedback variation, which is why we have prioritized prompting models with the latest in-market behavioral data rather than training them on historical preferences.
But not all consumer data is created equally. We have spent decades curating our datasets to ensure they are both controlled and calibrated to accurately reflect what consumers are both interested in—and actually buying—today. Leveraging these behavioral datasets in our prompts ensures that synthetic feedback is predictive of current and emerging trends instead of reflective of preferences from 5 or 10 years ago. From a time and resources standpoint, this approach also alleviates the need to constantly retrain models as information is updated and technical capabilities evolve, ensuring our platform is always utilizing the latest and greatest data and technology.
If you’re going to invest in market research to narrow down concept ideas, it’s essential that those insights point you in the right direction—otherwise, you’re no better off than simply picking the “team favorite.” By leveraging our extensive and proprietary scope of recent market data to prompt our synthetic personas, we found a clear divergence in output quality to validate our methodology and offer reassurance about our models’ ability to stand the test of time.
Best-in-class synthetic models should place data in context to navigate next steps
“While synthetic feedback generated by AI can offer valuable insights, it’s critical to place this output in context with real-world data. Without that balance, you risk acting on patterns that may lack relevance or overlook the nuances needed for informed decision making.”
— ChatGPT [carefully prompted by our authors]
You input the prompt, ran the query, generated the output … now what?
For most synthetic models models currently on the market, functionality ends once the output is generated. Unfortunately, this often leaves users holding the bag on next steps. Just how good is a “good” concept? Can a “bad” idea be optimized with a few tweaks, or should it be discarded? What is the next phase of idea development, and how can it be elevated all the way through launch?
As experts in curating unique datasets, we are building our own database of synthetic respondents. This repository will not only indicate exactly how well a concept performs but also place that performance in comparative context against innovators’ other ideas. By removing this guess work, innovators can determine which initiative is a better bet, or pinpoint improvements instead of going back to the drawing board.
Developing and launching a successful innovation is a lengthy process that goes well beyond idea vetting. Whether your consumer feedback is synthetic or human, providing that information in context to navigate next steps is crucial and, in our minds, what differentiates a best-in-class tool from an average one. The ability to evaluate your early-stage idea with The Full View™ of NIQ insights can make the difference between a product that simply launches and one that ultimately thrives.
Navigating the AI revolution together
Synthetic respondents are not a replacement for human consumers in market research; they are a supplement to your ideation process when time is of the essence. As game-changing as their potential might be, these models require unique data sets, prompting, and continuous calibration to consistently generate substantiated and actionable output. Synthetic feedback should be trusted only when the supplier has access to data that validates its accuracy. Thankfully, businesses wishing to explore this emerging technology for market research can avoid future burnout and skepticism by asking key questions of their vendors.
The AI revolution continues to disrupt, with both unprecedented opportunities and significant risks. At NIQ, our commitment to innovation is matched by our dedication to rigorous validation and refinement. We invite you to join us in navigating this transformative era, ensuring that together, we can harness the true power of GenAI for the future of consumer insights.
Want to learn more?
Ready to take your innovations to the next level with Generative AI? Contact us to learn about synthetic respondents beta testing at NIQ BASES.
The Promise of AI in Consumer Insights
At NIQ, we are firm believers in the transformative power of AI on our industry and its impact to our clients’ return on innovation. We’re also excited to lead this transformation. We’ve been on the forefront of AI innovation in the industry, with multiple AI-related scientific patents since 2015, a contribution that is orders of magnitude higher than that of most of the companies that claim to have the recipe for AI success. This search for AI-fueled innovation has paid great dividends for us at NIQ BASES, as we have been leveraging a combination of AI models to guide our clients to more successful new products for decades. Now, these past learnings are foundational when developing the next generation of AI-powered tools.
Martin Levanti
Martin Levanti is Vice President of Analytics Commercialization at NIQ BASES, where he plays a pivotal role in shaping AI-powered capabilities that enhance research tools and deliver valuable insights to global clients.
Courtenay Verret
Courtenay Verret is Vice President of Global Thought Leadership at NIQ, where she translates consumer intelligence into actionable information through storytelling.