3 min read
Securing Generative AI: The Role of a Data Bill of Materials
Pranava Adduri : Jul 9, 2024 5:09:25 AM
At Bedrock we utilize generative AI models to classify enterprise data. While such models are groundbreaking in generating nuanced, context-sensitive insights, they also present a unique security challenge: how do we secure and govern the output these models generate?
The Concerns of CISOs
In discussions with dozens of CISOs this past quarter, a primary concern emerged for those operationalizing GenAI co-pilots like Glean, Google Gemini, and Microsoft Copilot: the lack of assurance that these co-pilots won’t divulge sensitive information to unauthorized users. A Gartner peer poll supports this, showing 67% of respondents worry about unauthorized access to AI models and data. For instance, if a co-pilot is enabled on a Microsoft 365 tenant, could an employee access their peer’s salary details in response to the prompt “Tell me details about the last promotion cycle”? This concern stems from a lack of visibility into what data a model has access to, potentially causing sensitive data to unintentionally appear in a model's responses.
Sensitive data and entitlement sprawl are the biggest technical impedances to a successful co-pilot rollout. Co-pilots like Glean are permission-aware and will not return content to a user if the user doesn’t have access to the content. Unfortunately, this relies on the correctness of user permissions at the organizational level, whereas most organizations have wide entitlements for their users. Deploying a co-pilot in such a situation is akin to opening Pandora’s box; if an employee inadvertently had access to HR performance reviews, a co-pilot could reveal that information in response to a prompt about promotions.
The Data Bill of Materials (DBOM)
To address CISO’s concern about data security for GenAI, it will be critical to introduce the concept of a Data Bill of Materials (DBOM).
A DBOM is a comprehensive inventory of:
1. All data going into a model (training, fine-tuning) and
2. All data a model has access to during model inference (RAGs, co-pilots).
This post focuses on DBOMs for co-pilots, with subsequent posts covering DBOMs for RAGs, fine-tuning, and foundational model training. A DBOM can provide guarantees on what your co-pilot will and will not reveal and identify which groups or lines of business can safely use a co-pilot based on the accessible data.
Why a DBOM?
A DBOM provides a clear record of all data and its classification that a co-pilot has permissions to read and may use in generating its response to a user’s prompt. This transparency allows you to have a firm grasp on what information your model has access to and therefore what it could potentially reveal. This is akin to the SBOM (software bill of materials) that most enterprises use to better understand the interactions, dependencies, and vulnerabilities of their tech stack.
By implementing a DBOM, you can ensure your co-pilot deployment is not only effective but also secure and compliant with all relevant regulations (e.g., ensuring a co-pilot will not divulge PCI data). This proactive approach to data governance gives you the confidence to operationalize co-pilots for your lines of business without the worry of potential data leaks.
How DBOMs Help Security
A DBOM makes security teams aware of sensitive data in an environment where a co-pilot is deployed and which data may be revealed to users. A DBOM can answer questions like, “Which Lines of Business (LOB) can I enable a co-pilot for that won’t cause a sensitive data leakage?” With a DBOM in place, security teams can identify which LOBs present the least risk for enabling a co-pilot and which LOBs need to prune their existing entitlements before onboarding a co-pilot. Automating the generation of a DBOM enables security teams to quickly and confidently onboard an LOB onto a co-pilot, reducing the need for manual audits and increasing the accuracy of data security governance for GenAI.
If deploying a co-pilot like Glean, for example, a DBOM would help answer the questions of what types of data Glean would be able to reason over, and who would be able to see that data via prompt; this can be used to plan a strategic rollout of the most safe LOBs initially and also create a burndown of sensitive data and entitlements to decommission before a wider rollout.
As we continue to innovate with GenAI, it is crucial to evolve our approach to data security. A DBOM is an essential tool in this endeavor, providing the transparency and control needed to secure and govern generated content effectively.
To learn more about how the Bedrock can help improve your GenAI data security, read about our GenAI security solutions, speak with our data security experts, or meet us with at BlackHat USA 2024.