PII 🔐

Problem statement

Personally Identifiable Information (PII) is data, like email address, phone number, etc that can be associated with a specific person.

It's important to restrict access to this data to minimize opportunities for malicious behavior inside and outside a company.

Only data owners, authorized jobs and monitored support teams should have access to PII.

Solutions

Limit

Only store PII if it's required; only grant access to users who need PII; etc.

Delegate storage to other systems if possible, eg instead of storing an email address, store an identifier that can be exchanged for an email address via a user data service.

Document

To ensure PII is securely managed, establish a convention for identifying where it is. For example, all users running jobs with access to PII are in a group.

Encrypt

Encrypt data at rest, so people can't access PII by stealing a hard drive.

Use SSL, so people can't intercept data in transit.

Internal users

Restrict access based on users in a system.

Start from the bottom, where PII is stored, and work up.

Users running jobs need routine access. For example, foo user runs foo service, which needs access to foo PII to do its job.

Define an Access Control List (ACL) per environment (dev, staging, prod, etc) to restrict access to the users that can call these jobs.

Support teams may need special access. Define a way to request access in exchange for justification and an audit log entry. For example, if users are authenticated via a token, enable the token provisioning service to issue a token for another user if justification is provided. Automate analysis of the audit log.

External users

Ensure only data owners can access their data by requiring authentication (the identity of the caller) and authorization (the scope of accessible data). OAuth is a standard choice for this.

As with internal users, support teams may need special access, so define a way to request access in exchange for justification and an audit log entry.

Source control

Only allow binaries built from reviewed source code to run as authorized applications, to prevent a malicious user from circumventing ACLs by deploying a code change. For example, configure the hosting layer to only run jobs as certain users if those jobs were deployed by the CI user.

Development

Avoid dependencies on PII for development. Only data owners, authorized jobs and support teams should have access to PII.

Ensure applications can mock all dependencies (eg hermetic testing) for local development and tests.

Define a test data layer and use it for pre-prod environments, eg staging. Configure all services to use environments consistently to avoid quality issues that reduce confidence and motivate development with production data. For example, establish a convention for environment-specific addresses and provide a library that automatically selects an address based on the environment in which a job is running.

Define a pre-prod environment for end-to-end testing with PII prior to production.

Feedback

Thoughts? Suggestions? Hit me up @erikeldridge

License

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 International License, and code samples are licensed under the MIT license.