Sitemap

Member-only story

Preventing Sensitive Data Exposure in LLMs

Building a Secure SQL Agent Solution with PII Anonymization

Yi Ai
13 min readFeb 23, 2025

--

Many organizations want to leverage LLMs to help users query their databases more effectively. However, there’s a critical challenge: enterprise databases often contain sensitive information like PII that requires strict data governance. Simply passing database results to LLM APIs (like Azure OpenAI or Gemini) could expose sensitive data and usually requires extensive security reviews and whitelisting processes that can delay deployment.

So I developed a structured data anonymization solution that creates a secure pathway between LLMs and databases. The solution employs a two-stage approach: first, the SQL is generated in a sandbox environment that mirrors the production database’s structure — either by replicating the full DDL or by including only the required columns — using synthetic data. This setup ensures the model learns the table information without any risk to real data. In the 2nd stage, the generated SQL queries are executed against the production database, where any sensitive data in the results is detected and anonymized before being sent to the LLM. The LLM then processes this anonymized data to generate an answer, which is de-anonymized back to the original values before being presented to the user.

Workflow Diagram

--

--

Yi Ai
Yi Ai

No responses yet