Skip to content
The Algorithm
InsightsData Engineering
Data EngineeringCross-Industry10 min read · 2024-07-12

Column-Level Security in Analytical Data Platforms

Column-level security is the enforcement mechanism that ensures analysts can run queries against a regulated dataset without accessing fields they are not authorized to see. Snowflake Dynamic Data Masking, BigQuery column-level security with policy tags, and Databricks Unity Catalog column masking each implement this capability with different semantics. The compliance requirement is the same across all three: a user who is not authorized to see a date of birth or account number must not be able to infer that value through aggregation or correlation queries. Implementing column-level security correctly requires understanding how each platform handles aggregation bypass attempts, the interaction between column masking and row access policies, and the audit log evidence that demonstrates the control is functioning.

Column-level security is the enforcement mechanism that makes broad analytical access to regulated datasets tractable. Without it, the choice is binary: grant full table access, which exposes all sensitive fields to all analysts, or create separate views for each access level, which creates an unmaintainable proliferation of view objects. Column-level security allows a single table to be queried by analysts at different permission levels, with sensitive fields automatically masked, nulled, or replaced with synthetic values for analysts who lack the necessary permissions. The same query returns full PHI to authorised clinical analysts and masked values to operational analysts, with no separate table, no separate view, and no application-layer filtering.

Snowflake Dynamic Data Masking

Snowflake's Dynamic Data Masking applies masking policies to specific columns, with policy logic that evaluates the querying user's role at query execution time. A masking policy on the date_of_birth column might return the full date for users with the CLINICAL_ANALYST role, return the birth year only for users with the OPERATIONAL_ANALYST role, and return NULL for all other roles. The policy is defined once, applied to the column, and evaluated automatically on every query. There is no application-layer code that implements the masking, and no possibility of the application forgetting to apply it.

Tag-based masking in Snowflake extends this model to data catalog integration. A classification tag applied to a column triggers a masking policy automatically, without requiring explicit policy assignment column by column. When a new table is created with a date_of_birth column tagged PHI in the Snowflake data catalog, the PHI masking policy attaches automatically. This tag propagation model is critical for organisations where new tables are created frequently by many teams. Manual column-by-column masking policy assignment at scale is error-prone, and tag-based automation closes the gap between data classification and access control enforcement.

BigQuery Column-Level Security with Policy Tags

Google BigQuery implements column-level security through a policy tag taxonomy in the Data Catalog. A policy tag hierarchy is created in Data Catalog, and columns in BigQuery tables are tagged with the appropriate node in the hierarchy. IAM policies are then applied to policy tags rather than to individual columns: a user with the Fine-Grained Reader IAM role on the Direct Identifier policy tag can see the tagged columns; a user without that role sees NULL values for those columns in query results.

The policy tag approach has a significant compliance advantage: the access control decision is made in the Data Catalog, where data governance teams have ownership, rather than in the database, where platform engineers have ownership. The data governance team can grant and revoke access to sensitive column categories without requiring platform engineer involvement, and the policy tag assignments are audited in Cloud Audit Logs independently of the BigQuery query logs. For GDPR data protection by design requirements and HIPAA minimum necessary access, this separation of governance and access control implementation is architecturally aligned with the compliance framework intent.

Databricks Unity Catalog Column Masking

Databricks Unity Catalog provides column masking through masking functions: SQL functions that are associated with specific columns in a table and applied automatically when those columns are queried. Unlike Snowflake's masking policies, Unity Catalog masking functions can embed complex logic including lookup of fine-grained entitlements from external systems. A masking function for patient clinical data might call an entitlement service to check whether the querying user has an active care relationship with the specific patient before returning unmasked values, providing per-patient and per-user authorisation that goes beyond role-based masking.

Unity Catalog's lineage integration means that column masking is recorded in the query lineage graph. An auditor asking which queries accessed unmasked PHI columns can query Unity Catalog lineage to find every query that received unmasked values, along with the user identity, timestamp, and compute cluster that executed the query. This lineage-integrated audit capability is more comprehensive than traditional database access logs, which record that a query ran but not which specific column values were returned in masked or unmasked form.

Aggregation Bypass and Inference Attacks

Column-level masking prevents direct access to sensitive field values but does not automatically prevent inference through aggregation. An analyst who cannot see individual date_of_birth values may be able to infer specific values through a GROUP BY query that returns a count revealing whether a specific individual is in the dataset. For datasets containing small populations or rare characteristics, aggregation queries can effectively de-anonymise masked data.

Preventing aggregation-based inference requires supplementary controls. Differential privacy mechanisms add calibrated noise to aggregation results and protect against exact reconstruction but require accepting statistical inaccuracy in analytics outputs. Minimum group size restrictions block aggregation results from cells with fewer than a threshold number of records and protect against small-group inference without introducing noise, at the cost of suppressing some legitimate analytics queries. The regulated data environments where aggregation bypass is a compliance concern must implement these supplementary controls alongside column masking, not treat masking as a complete solution.

Audit Logging for Column-Level Access Evidence

Column-level security audit evidence must answer the specific question that compliance investigations ask: did this user access the unmasked value of this sensitive field on this date? Standard database query logs record query text but not which column values were returned in masked versus unmasked form. Snowflake's ACCESS_HISTORY view provides column-level access tracking, recording which columns were read and whether masking was applied for each query. BigQuery's Data Access audit logs capture column-level access when policy tags are in use. Databricks Unity Catalog lineage captures column-level read provenance. Configuring these audit mechanisms, routing their output to a SIEM, and retaining the audit logs for the full regulatory retention period is the operational component of a column-level security implementation.

Related Articles
Compliance Engineering

EU AI Act: What CTOs Actually Need to Do Before August 2026

Read →
Vendor Recovery

The Vendor Rescue Pattern: How to Recover a Failed Implementation in 12 Weeks

Read →
AI in Regulated Industries

The LLM Hallucination Problem in Regulated Environments: What 'Acceptable Error Rate' Actually Means

Read →
Facing This?

The engineering behind this article is available as a service.

We have done this work — not advised on it, not reviewed documentation about it. If the problem in this article is your problem, the first call is with a senior engineer who has solved it.

Talk to an EngineerSee Case Studies →
Engage Us