How AI Provenance Standards Meet Global Regulations

artificial intelligenceprivacymultimodalresponsible ai

Oct 6

The development of Generative AI has presented a challenge to authenticity, making it increasingly difficult to distinguish human-created content from synthetic output. In response, global policymakers, particularly in China and California, are quickly enacting legislation aimed at increasing transparency and traceability in digital media. In this article, will address the latest from China’s content labelling policy and California’s newest bill on regulating AI companies.

For businesses operating internationally, navigating this patchwork of requirements is critical. The key to successful compliance is investing in technical frameworks that prioritize verifiable data provenance, allowing companies to meet today’s mandates while preparing for tomorrow’s demands.

China and California AI Regulations and Scope

Regulatory environments are demanding that companies take responsibility for the content their AI systems create. Both China and California have enacted specific requirements focused on labeling AI-generated content, though their scope and structure differ significantly.

China’s AI Labelling Regulations

China’s regulatory framework relies on two major components: the Labeling Measures for Content Generated by Artificial Intelligence (the “Measures”) and the mandatory national standard, Cybersecurity Technology—Labeling Methods for Content Generated by Artificial Intelligence (the “Methods”). These rules build upon earlier regulations and aim to enhance the traceability and transparency of AI-generated content, mitigating the risks associated with deepfakes and ensuring the authenticity of public information. These Measures are set to take effect on September 1, 2025.

Scope

The Chinese regulations primarily apply to internet information service providers (Service Providers), including those offering AI content generation and online content dissemination services. This also includes service providers located outside China, if they target the public in Mainland China.

China mandates two types of labels for AI-generated synthetic content:

Explicit Labels (Perceptible): These labels are perceptible to users, displayed as text, sound, images, or other forms. Explicit labels are required for services that may cause public confusion or misunderstanding, such as intelligent dialogue, synthetic voice, facial generation, or specific text-to-image/video generations. For images, the text height should be at least 5% of the shortest side length, and explicit labels must be maintained when content is downloaded or copied.
Implicit Labels (Invisible/Metadata): These are marks embedded in the metadata of AI-generated content, remaining invisible to users but extractable through technical means. Implicit labels must be embedded in the metadata of all AI-generated synthetic content, irrespective of whether it causes public confusion. Implicit watermarks should include the service provider’s name and be detectable via an interface or other tools.

The Measures also place obligations on App distribution platforms (to verify that labeling functions are enabled before app listing) and users (to proactively declare AI-generated content when publishing).

California’s AI Regulations

California has enacted several laws addressing AI transparency and safety.

California SB 942 (AI Watermarking)

This law, effective starting January 2026, focuses on helping the public identify AI-generated content.

Applies to any person who creates, codes, or produces a generative AI system that is publicly accessible within California and has over 1,000,000 monthly visitors or users.
Covered providers must adhere to several disclosure obligations:
- They must offer users the option to include a manifest disclosure (a visible label) in content created or altered by the system.
- They must include a latent disclosure (an invisible label) in the content created by the generative AI system.
- They must make available an AI detection tool that supports an API, allowing users to check the latent and/or manifest disclosures without visiting the provider's site.

California SB 53 (Transparency in Frontier AI Act)

Signed on September 2025, SB 53 places new regulations on the industry's top players.

Focuses explicitly on the safety of cutting-edge and powerful AI models.
Requires leading AI companies to publish public documents detailing how they follow best practices for creating safe AI systems.
It also creates a mechanism for companies to report severe AI-related incidents to the state’s Office of Emergency Services.

It is actually in the best interest of model developers to be able to identify generated images, so these do not poison future training runs. However, it is challenging to have this information be shareable or to have incentive to share this knowledge with end users, and that is where ideally regulation and policies help.

Technical Implementation Techniques Supporting Both Regulations

Compliance with both Chinese and Californian laws necessitates mechanisms for content authentication: visible disclosures and invisible, machine-readable provenance data.

We cover some ideas on two types of approaches. One involves leveraging content authenticity techniques that embed verifiable provenance information directly into the asset. Another covers watermarking and metadata.

There are many other techniques that are still being researched and improved, and there are limitations with the current approaches as well.

Cryptographic Provenance using Content Credentials (C2PA)

The Coalition for Content Provenance and Authenticity (C2PA) provides an open standard designed to address the challenges of trusting media in an era of easily manipulated digital content.

Content Credentials, also known as a C2PA Manifest, are cryptographically bound structures that record an asset's history (provenance). This system uses cryptographic hashes and digital signatures to ensure the integrity and authenticity of the provenance data, making it tamper-evident—any alteration to the asset or the provenance data would invalidate the hash.
C2PA is an essential technical implementation for satisfying latent/implicit disclosure requirements:
- A Content Credential records statements (assertions) about the asset, including its origin, modifications, and most importantly, the use of AI (how it was authored).
- When an action is performed by an AI/ML system, it is clearly identified in the Content Credential through its digitalSourceType field.
- Implementation involves generating the C2PA Manifest, cryptographically signing it, and embedding it directly within the asset (hard binding) or linking it externally via invisible watermarks or fingerprint lookup (soft binding). This soft binding enhances durability, allowing the credential to be discovered even if removed from the asset.

Google and other companies part of the C2PA organizational program or committee have been working on various levels of implementation of this. Google also established SynthID since 2024 and has already been incorporating and testing C2PA into Google workflows. It should be noted, however, that this, along with the second grouping of techniques in watermarking/metadata, cannot account for when screenshots are taken and then propagated.

Watermarking and Metadata Standards

Digital watermarking and structured metadata are integral to achieving both implicit and explicit compliance:

Implicit Watermarking/Latent Disclosures: Both China and California require invisible tagging. AI watermarking embeds a unique signal or identifier into the AI output that is invisible to humans but algorithmically detectable, tracing the content back to the AI model. China specifies that these implicit watermarks for images, videos, and audio must include the service provider’s identity. SynthID, mentioned earlier, is a great example of how this can work with third party regulatory and security groups.
IPTC Digital Source Type: C2PA works in conjunction with standard metadata formats like IPTC (International Press Telecommunications Council). IPTC provides controlled values that explicitly identify content created using generative AI (trainedAlgorithmicMedia) or edited with generative AI (compositeWithTrainedAlgorithmicMedia). Embedding this machine-readable data supports the requirements for invisible, machine-detectable labels demanded by both jurisdictions.
Advanced Cryptography (ZK-SNARKs): A further technical approach involves using Zero-Knowledge proofs, specifically ZK-SNARKs. These cryptographic proofs can be imperceptibly embedded into content to attest to the origin of an image without revealing proprietary information like the model’s code or parameters. This technology establishes provenance, but is not fully developed.

However, again, this would not support the scenario of people who want to screenshot and share this (unless explicit watermarks cannot be erased and are very large). This would also not cover open source generated images and videos.

Supporting Explicit Labels

Companies can support the visible (explicit/manifest) labeling requirements by implementing clear, accessible labels near the image, such as short, neutral captions (e.g., "Image: AI-generated") or small on-image badges, ensuring they maintain required contrast and readability.

Unfortunately, it is often the case that these are easy to bypass.

Getting Ahead of Future Regulatory Requirements

The current regulatory environment for AI labeling is complex and rapidly evolving, meaning that global standards are not yet aligned. Companies should focus their time and resources on building a unified data provenance framework and implementing responsible AI best practices. This could also include ISO 42001 certification.

Below are some key areas to consider:

Adopting and Standardizing Provenance Technologies

While regulations currently specify what must be labeled (e.g., AI-generated content), they often lack specific technical standards on how compliance is achieved.

Prioritize Open, Interoperable Standards: The C2PA specification is global and designed to be interoperable. Companies should prioritize embedding C2PA Content Credentials (Track B implementation) to provide verifiable, tamper-evident metadata. This provides a durable solution, as it combines hard binding (cryptographic hashing) with soft binding (watermarking/fingerprinting).
Standardize Data Documentation: Provenance solutions currently function in isolation. Future requirements will demand a unified data provenance framework that merges content authenticity techniques (like C2PA) with data provenance standards (like Data Sheets) and data provenance libraries. This structured, extensible framework will adapt to diverse jurisdictional requirements and allow automated tools to navigate the data.

Strengthening Robustness and Detection

Current watermarking techniques face limitations in robustness, as they can be manipulated, removed, or altered (e.g., through backdoor attacks). Future requirements will likely stress the need for systems that resist tampering.

Invest in Tamper-Proofing: Companies should allocate resources to interdisciplinary research to develop more robust watermarking and AI-content detection techniques. Staying on top of the latest research here will help companies remain compliant.
Adopt Post-Quantum Cryptography: To future-proof the integrity of their digital signatures and credentials, companies should prepare for the adoption of post-quantum cryptography, such as the ML-DSA algorithms planned for the C2PA standard.
Build MLOps Best Practices: Even before generative AI, ML Ops and now LLM Ops have been promoted as ways to implement responsible AI. This is still true today.

Addressing Data Consent and Privacy

The use of massive, widely sourced, and often undocumented training data has led to crises in data privacy and copyright infringement. Future global regulations will likely focus heavily on creator rights.

Implement Consent Infrastructure: Companies should focus on developing and utilizing infrastructure that allows content creators to explicitly register how their work should be used (opt-in/opt-out tools). Provenance tracking aids creators by letting them know how their work is used, giving them opportunities to provide consent, seek credit, and potentially receive fair compensation.

By proactively adopting standards like C2PA and building technical infrastructure that supports verifiable provenance, companies can ensure they are ready not only for the impending deadlines in China and California but also for the global regulatory harmonizations and advanced cryptographic requirements that will define the future of trustworthy AI.

vlmsmultimodalregulatoryresponsibleaipolicy

Kamayani Gupta