Building Reliable GenAI Solutions for Media Publishing & Advertising Companies

Written by: Michael Muckel (Founder & CTO)

May 6, 2025

Generative AI (GenAI) has made a lasting impact on companies in every segment and industry. While the hype around this foundational technology has cooled down a bit in the last few months, most companies are implementing ambitious GenAI applications to improve existing business processes or generate additional revenue potentials.

Media, Publishing and Advertising (summarized as Media) are mentioned as one of the key industries that are already seeing a major impact of this GenAI on their business (Word Economic Forum 2025 and Deloitte 2023). GenAI technology is already disrupting existing industry structures and offering potential for new upstarts and business models.

GenAI can support every stage of content creation, distribution and consumption. And it can speed up innovation with fast prototyping of new ideas. Most media companies have identified use cases that can benefit significantly from automation or augmentation (i.e. supporting human staff with AI). These include stages in the complete value creation pipeline like content creation, content personalization, automated variant testing, and channel specific content.

But media companies also increasingly realize that developing and operating reliable applications is more challenging than anticipated and a lot of AI initiatives are severely struggling to meet timelines and budgets. The early, promising results created by prototypes are often followed by long, intensive periods to increase the quality and reliability to get to a production-ready level. Inaccurate or erroneous results pose an additional legal and reputational risk in an industry that is already very sensitive to these risks. So, quality and reliability are a must for any solution to be adopted.

The good news is that leading media and tech companies have shown that GenAI can be used to build effective, reliable solutions. Focusing on a quality management process that is specific to GenAI application development enables companies to deploy new applications fast and with confidence. The strategy is not to replace existing processes but to extend and adapt them to meet the specific needs of GenAI.

This article outlines the potential and challenges for media companies to implement GenAI solutions using a specific example: Article Summarization for Personalization and Channel-Specific Optimization. This article is based on ZENETICS experience working with media companies to deliver reliable, high-quality AI solutions to challenging business problems.

The article guides you through the necessary information to get started with Quality Management for GenAI applications:

Understand potential and challenges of GenAI for Media applications
Discover a process to test and monitor application quality and reliability
Identify steps to get your development for GenAI on track for GenAI

First, let’s look at the general potential for GenAI in the Media domain.

The Potential of GenAI for Media & Publishing

While GenAI holds potential for almost any industry, media is particularly impacted by this technology. Media content (text, images, videos, audio) is particularly suited for GenAI technologies like Large Language Models (LLM) or multi-model models which combine multiple content formats text, images, audio and video. These capabilities make GenAI highly relevant for the media industry to extend and optimize their business models.

The following list applications of GenAI in Media that are already very commonly applied but the list is far from being exhausting:

Channel-Specific Optimizations: Specific channels pose different requirements for the generated content in terms of content length, media capabilities (size, resolution, video- and audio capabilities) and presentation. For example, Digital-out-of-Home (DooH) is growing in importance to extend media reach but poses strict requirements for content to be displayed effectively.

Personalized Content: Customers and visitors are becoming much more sensitive to the content they consume. Personalization is a well-known strategy for media companies to increase the relevance of their content and cut through the noise of information that customers consume. GenAI can automatically generate personalized content at scale, for customer segments or even for individuals based on their preferences.

Automated Variant Testing: A/B testing, or the more general concept of multi-variate testing, are common in media. So far, a bottleneck has been the generation of content to be tested. GenAI can automatically create a diverse set of content that can be tested and feedback can be collected automatically to optimize the generation process to produce even more effective variants.

For our example of Content Summarization, GenAI can be applied to distribute existing content automatically to a variety of channels, making sure that the specific requirements of the channels are respected (e.g. the length of sentences and the content for DooH displayed, or the structure of the text optimized for search results), and that content meets core requirements like personalization and SEO-optimization.

In summary, GenAI can power all of these use cases, when companies find ways to manage the resulting challenges and risks that are at the core of GenAI. Let’s now take a look at what makes applying GenAI for such use cases challenging.

Challenges Building Reliable GenAI Applications

While GenAI makes it easy to achieve initial results fast and with reasonable investments, companies realize that getting their solutions to a level of reliability where they can confidently provide it to their users and customers on production is more challenging than anticipated. This complexity is based on the very nature of GenAI.

Probabilistic Core of GenAI: The impressive, human-like performance of GenAI comes from the complexity of the underlying models. GenAI models (like Large Language Models) are highly complex machine learning models that use probabilities to generate results. While this enables impressive results, it creates challenges to ensure that applications generate reliable, reproducible results and can introduce errors that are unacceptable in a business context, like hallucinations, grammatical or logical errors.

Incorrect Results: GenAI can produce impressive results. But the reality is that it also struggles with basic tasks that are a must for media companies. Spelling mistakes are a constant risk, especially for names, locations, events, and other types of information that might not be represented in the model’s training data.

Inaccurate Information: In the context of GenAI, hallucinations describe information that might or might not be correct in the context of the generated content. When we take a summary for a sports article as an example, GenAI often changes specific results (e.g. score, results) or provides a subjective statements like exaggerations that are not backed by the original article

Violated Channel-Specific Requirements: As discussed above, channels (like websites, video content, DooH displays) have specific requirements to make sure that users can consume the content in the best possible way. Unfortunately, GenAI applications often do not respect the specific instructions well enough to generate content that meets these requirements.

Inconsistent Brand Voice: Finally, media companies spend significant time to understand their audiences and produce content that matches these audiences. In today’s competitive media landscape, this is a must to keep consumers close to their brands. Generating content that matches the tonality and specifics of the audience is possible but still not reliable enough, leading to content that does not fit into the general tonality of the brand.

It is important to remember that today’s GenAI models are general-purpose models, trained on immense datasets from different domains and sources. These datasets are often several months old and therefore, do not include information about latest news, making it often difficult to use in the media context.

For our selected use case of Content Summarization, a number of risks exist that often lead companies to delay or even abandon the application of GenAI.

Incorrectly spelled names, locations and events: Especially for local or highly specific news, the models might not have encountered enough examples and therefore output words that are similar, resulting in incorrect words.
Incorrect results: Generated results can contain hallucinations resulting from content similar to the one being processed. This leads to results that can lead from humorous results to severe business risks when left unchecked.
Biased results: LLM are likely to extend texts with interpretations that might not be intended. This can include statements about sports results (stating that a 7:1 result is a close victory) to interpretations of political events.
Policies: Media companies have specific policies to address their audience. This might include gender-specific/-neutral formulations or general neutrality when presenting sensitive information. LLMs struggle to output these instructions at scale, resulting in frequent violations of the policies.
Spelling errors: While common spelling mistakes are rare for large models (e.g. OpenAI ChatGPT), smaller or fine-tuned models are prone to output mistakes at a significant rate.
Invalid structure: Teams often provide significant instructions to the models to generate results according to the generated structure. But the probabilistic nature of the models often generates results that fail to take the instructions into account, generating text that e.g. exceeds the specified length, violates structural layout, or contains additional formatting (quotes) that require adjustments in post-processing steps.

These lists are only examples of how their GenAI can stray from the intended results. The key insight is that these things happen and they will eventually happen to your application as well. This means that addressing the inherent risks is a must when working with this powerful technology.

Defining Your Quality Management Strategy

The good news is that the challenges of GenAI can be overcome with the right approach. Companies can build on top of the existing expertise of building traditional digital solutions and extend the process to cover the specific quality risks that GenAI brings with it. This can lead to creating a lasting competitive advantage for both costs and additional revenue, improving bottom-line performance.

The GenAI Application Lifecycle

Quality Management for GenAI needs to cover the complete lifecycle of an application, which consists of the main two stages:

Development & Testing (before a release): This includes extensive, automated and manual testing to ensure that the version of the application handles the main use cases and boundary cases well. A success factor for this stage is the collaboration between domain experts and engineering teams to cover not only technical but also highly domain specific test cases.

Monitoring (after a release): Once a new version has been released to production, continuous monitoring and evaluation of the generated content ensures that invalid results are flagged and prevented from distribution. Inappropriate content can result from changes in the source data (new data sources for articles) or by untested cases (new categories of articles). If testing each generated result is not possible due to technical complexity or costs, sampling strategies can be used to approximate the measurement well enough.

The GenAI-Ready Quality Management Process

Now that the basic foundation for testing is available, we can focus on identifying the specific test criteria. The process for implementing the quality management strategy is defined by four basic steps:

1. Define: The first and most important step in the process is to define what aspects of quality are most important and what specific risks need to be addressed. Every use case has different perspectives and requirements, and by defining the criteria to identify both positive and negative cases of generated content, teams create deep understanding required for the following steps. In almost any case, this step includes experts from domains like editorial, product, sales, operations, etc.

2. Map: Once the criteria are documented, they can be mapped to specific tactics in the quality management strategy. This can include automated reference testing (against a reference dataset, also called golden records) or evaluations that automatically check specific perspectives of the content. Modern quality management solutions like ZENETICS support you in defining the technical evaluation strategy.

3. Measure: Testing for quality and reliability needs to be a repeatable process that is triggered whenever changes to the application (prompt, code) or the data context (source data) are applied. Integrating the tests into the continuous testing process (continuous integration) is often a valuable investment to ensure that teams always understand the level of quality and reliability throughout the application lifecycle.

4. Improve: Once issues are identified by the test process, developers and domain experts can run root cause analysis and identify solutions to fix the issues. Once changes are applied, teams can add specific test cases to ensure that following changes do not introduce the same or similar issues again.

In addition, persisting test processes and test results can help you tremendously when your applications are due for an audit or other forms of inspections. In an increasingly regulated space, this can become a major time safer for your tech and audit teams.

The GenAI-Ready Quality Management Process in Action

Let’s look how the GenAI Quality Management process can be applied to the use case of article summarization. The table shows four criteria that are relevant for determining the quality of summarized versions for a news article and how these criteria are mapped to specific, technical evaluators.

Criteria	Description	Evaluation Strategy
Content Length	Minimum and maximum word and character counts for both the complete text and individual sentences to ensure that the text is displayed correctly on a wide range of DooH displays.	Content Length Evaluator
Content Structure	Ensure that the summary is presented as a single paragraph without any unnecessary formatting characters (e.g. quotes, newlines).	Reference Evaluator
Factual Accuracy	Identify statements in the summarized content that do not have a corresponding statement in the source article.	Hallucination Evaluator
Spelling	Apply standard spellchecking to detect misspelled words.	Spelling Evaluator
Entity Spelling	Compare entities (persons, locations, events) from the generated results to the original article to ensure that no misspellings are present.	Entity Matching Evaluator
Editorial Guidelines	Analyze the style of the article to meet the brand guidelines (e.g. gender neutrality)	G-Eval Evaluator (custom criteria)

The ecosystem for developing, testing and operating a portfolio of GenAI applications is still small but is growing. Solutions like ZENETICS help you to get started with predefined processes and evaluations. This enables you to focus on your business domain and its challenges, rather than on the technical details.

Conclusion

This article can offer only a glimpse of what the future looks like for Media, Publishing and Advertising with the rise of GenAI. Automating many of the use cases is still challenging at the moment given the complexity and missing reliability of GenAI in general. Still, Media companies already leverage the benefit of GenAI by augmenting their organization and core steps in the value creation process with AI-capabilities. Humans will continue to play a critical role in Media and a strategy to make organizations more effective will provide more capacity for innovation and truly outstanding work.

The GenAI-ready process of Define-Map-Measure-Improve outlined in this article is already used widely in and across the industry and has proven to be effective. Developing and introducing GenAI in the Media value creation does not require a completely new approach, but adjustments to make the processes fit for GenAI.

Interested to Learn More About LLM-Testing?

ZENETICS is one of the leading solutions for testing complex AI applications. Schedule a meeting to learn more about how to set up an effective LLM testing strategy and how ZENETICS can help you with that.

Learn More About Zenetics