Detect and protect sensitive data with Amazon Lex and Amazon CloudWatch Logs
In today’s digital landscape, the protection of personally identifiable information (PII) is not just a regulatory requirement, but a cornerstone of consumer trust and business integrity. Organizations use advanced natural language detection services like Amazon Lex for building conversational interfaces and Amazon CloudWatch for monitoring and analyzing operational data. One risk many organizations face is […]
In today’s digital landscape, the protection of personally identifiable information (PII) is not just a regulatory requirement, but a cornerstone of consumer trust and business integrity. Organizations use advanced natural language detection services like Amazon Lex for building conversational interfaces and Amazon CloudWatch for monitoring and analyzing operational data.
One risk many organizations face is the inadvertent exposure of sensitive data through logs, voice chat transcripts, and metrics. This risk is exacerbated by the increasing sophistication of cyber threats and the stringent penalties associated with data protection violations. Dealing with massive datasets is not just about identifying and categorizing PII. The challenge also lies in implementing robust mechanisms to obfuscate and redact this sensitive data. At the same time, it’s crucial to make sure these security measures don’t undermine the functionality and analytics critical to business operations.
This post addresses this pressing pain point, offering prescriptive guidance on safeguarding PII through detection and masking techniques specifically tailored for environments using Amazon Lex and CloudWatch Logs.
Solution overview
To address this critical challenge, our solution uses the slot obfuscation feature in Amazon Lex and the data protection capabilities of CloudWatch Logs, tailored specifically for detecting and protecting PII in logs.
In Amazon Lex, slots are used to capture and store user input during a conversation. Slots are placeholders within an intent that represent an action the user wants to perform. For example, in a flight booking bot, slots might include departure city, destination city, and travel dates. Slot obfuscation makes sure any information collected through Amazon Lex conversational interfaces, such as names, addresses, or any other PII entered by users, is obfuscated at the point of capture. This method reduces the risk of sensitive data exposure in chat logs and playbacks.
In CloudWatch Logs, data protection and custom identifiers add an additional layer of security by enabling the masking of PII within session attributes, input transcripts, and other sensitive log data that is specific to your organization.
This approach minimizes the footprint of sensitive information across these services and helps with compliance with data protection regulations.
In the following sections, we demonstrate how to identify and classify your data, locate your sensitive data, and finally monitor and protect it, both in transit and at rest, especially in areas where it may inadvertently appear. The following are the four ways to do this:
- Amazon Lex – Monitor and protect data with Amazon Lex using slot obfuscation and selective conversation log capture
- CloudWatch Logs – Monitor and protect data with CloudWatch Logs using playbacks and log group policies
- Amazon S3 – Monitor and protect data with Amazon Simple Storage Service (Amazon S3) using bucket security and encryption
- Service Control Policies – Monitor and protect with data governance controls and risk management policies using Service Control Policies (SCPs) to prevent changes to Amazon Lex chatbots and CloudWatch Logs groups, and restrict unmasked data viewing in CloudWatch Logs Insights
Identify and classify your data
The first step is to identify and classify the data flowing through your systems. This involves understanding the types of information processed and determining their sensitivity level.
To determine all the slots in an intent in Amazon Lex, complete the following steps:
- On the Amazon Lex console, choose Bots in the navigation pane.
- Choose your preferred bot.
- In the navigation pane, choose the locale under All Languages and choose Intents.
- Choose the required intent from the list.
- In the Slots section, make note of all the slots within the intent.
After you identify the slots within the intent, it’s important to classify them according to their sensitivity level and the potential impact of unauthorized access or disclosure. For example, you may have the following data types:
- Name
- Address
- Phone number
- Email address
- Account number
Email address and physical mailing address are often considered a medium classification level. Sensitive data, such as name, account number, and phone number, should be tagged with a high classification level, indicating the need for stringent security measures. These guidelines can help with systematically evaluating data.
Locate your data stores
After you classify the data, the next step is to locate where this data resides or is processed in your systems and applications. For services involving Amazon Lex and CloudWatch, it’s crucial to identify all data stores and their roles in handling PII.
CloudWatch captures logs generated by Amazon Lex, including interaction logs that might contain PII. Regular audits and monitoring of these logs are essential to detect any unauthorized access or anomalies in data handling.
Amazon S3 is often used in conjunction with Amazon Lex for storing call recordings or transcripts, which may contain sensitive information. Making sure these storage buckets are properly configured with encryption, access controls, and lifecycle policies are vital to protect the stored data.
Organizations can create a robust framework for protection by identifying and classifying data, along with pinpointing the data stores (like CloudWatch and Amazon S3). This framework should include regular audits, access controls, and data encryption to prevent unauthorized access and comply with data protection laws.
Monitor and protect data with Amazon Lex
In this section, we demonstrate how to protect your data with Amazon Lex using slot obfuscation and selective conversation log capture.
Slot obfuscation in Amazon Lex
Sensitive information can appear in the input transcripts of conversation logs. It’s essential to implement mechanisms that detect and mask or redact PII in these transcripts before they are stored or logged.
In the development of conversational interfaces using Amazon Lex, safeguarding PII is crucial to maintain user privacy and comply with data protection regulations. Slot obfuscation provides a mechanism to automatically obscure PII within conversation logs, making sure sensitive information is not exposed. When configuring an intent within an Amazon Lex bot, developers can mark specific slots—placeholders for user-provided information—as obfuscated. This setting tells Amazon Lex to replace the actual user input for these slots with a placeholder in the logs. For instance, enabling obfuscation for slots designed to capture sensitive information like account numbers or phone numbers makes sure any matching input is masked in the conversation log. Slot obfuscation allows developers to significantly reduce the risk of inadvertently logging sensitive information, thereby enhancing the privacy and security of the conversational application. It’s a best practice to identify and mark all slots that could potentially capture PII during the bot design phase to provide comprehensive protection across the conversation flow.
To enable obfuscation for a slot from the Amazon Lex console, complete the following steps:
- On the Amazon Lex console, choose Bots in the navigation pane.
- Choose your preferred bot.
- In the navigation pane, choose the locale under All Languages and choose Intents.
- Choose your preferred intent from the list.
- In the Slots section, expand the slot details.
- Choose Advanced options to access additional settings.
- Select Enable slot obfuscation.
- Choose Update slot to save the changes.
Selective conversation log capture
Amazon Lex offers capabilities to select how conversation logs are captured with text and audio data from live conversations by enabling the filtering of certain types of information from the conversation logs. Through selective capture of necessary data, businesses can minimize the risk of exposing private or confidential information. Additionally, this feature can help organizations comply with data privacy regulations, because it gives more control over the data collected and stored. There is a choice between text, audio, or text and audio logs.
When selective conversation log capture is enabled for text and audio logs, it disables logging for all intents and slots in the conversation. To generate text and audio logs for particular intents and slots, set the text and audio selective conversation log capture session attributes for those intents and slots to “true”. When selective conversation log capture is enabled, any slot values in SessionState, Interpretations, and Transcriptions for which logging is not enabled using session attributes will be obfuscated in the generated text log.
To enable selective conversation log capture, complete the following steps:
- On the Amazon Lex console, choose Bots in the navigation pane.
- Choose your preferred bot.
- Choose Aliases under Deployment and choose the bot’s alias.
- Choose Manage conversation logs.
- Select Selectively log utterances.
- For text logs, choose a CloudWatch log group.
- For audio logs, choose an S3 bucket to store the logs and assign an AWS Key Management Service (AWS KMS) key for added security.
- Save the changes.
Now selective conversation log capture for a slot is activated.
- Choose Intents in the navigation pane and choose your intent.
- Under Initial responses, choose Advanced options and expand Set values.
- For Session attributes, set the following attributes based on the intents and slots for which you want to enable selective conversation log capture. This will capture utterances that contain only a specific slot in the conversation.
x-amz-lex:enable-audio-logging:
>: = "true" x-amz-lex:enable-text-logging:
: = "true"
- Choose Update options and rebuild the bot.
Replace
Monitor and protect data with CloudWatch Logs
In this section, we demonstrate how to protect your data with CloudWatch using playbacks and log group policies.
Playbacks in CloudWatch Logs
When Amazon Lex engages in interactions, delivering prompts or messages from the bot to the customer, there’s a potential risk for PII to be inadvertently included in these communications. This risk extends to CloudWatch Logs, where these interactions are recorded for monitoring, debugging, and analysis purposes. The playback of prompts or messages designed to confirm or clarify user input can inadvertently expose sensitive information if not properly handled. To mitigate this risk and protect PII within these interactions, a strategic approach is necessary when designing and deploying Amazon Lex bots.
The solution lies in carefully structuring how slot values, which may contain PII, are referenced and used in the bot’s response messages. Adopting a prescribed format for passing slot values, specifically by encapsulating them within curly braces (for example, {slotName}
), allows developers to control how this information is presented back to the user and logged in CloudWatch. This method makes sure that when the bot constructs a message, it refers to the slot by its name rather than its value, thereby preventing any sensitive information from being directly included in the message content. For example, instead of the bot saying, “Is your phone number 123-456-7890? ” it would use a generic placeholder, “Is your phone number {PhoneNumber}? ” with {PhoneNumber}
being a reference to the slot that captured the user’s phone number. This approach allows the bot to confirm or clarify information without exposing the actual data.
When these interactions are logged in CloudWatch, the logs will only contain the slot name references, not the actual PII. This technique significantly reduces the risk of sensitive information being exposed in logs, enhancing privacy and compliance with data protection regulations. Organizations should make sure all personnel involved in bot design and deployment are trained on these practices to consistently safeguard user information across all interactions.
The following is a sample AWS Lambda function code in Python for referencing the slot value of a phone number provided by the user. SML tags are used to format the slot value to provide slow and clear speech output, and returning a response to confirm the correctness of the captured phone number:
Replace INTENT_NAME and SLOT_NAME with your preferred intent and slot names, respectively.
CloudWatch data protection log group policies for data identifiers
Sensitive data that’s ingested by CloudWatch Logs can be safeguarded by using log group data protection policies. These policies allow to audit and mask sensitive data that appears in log events ingested by the log groups in your account.
CloudWatch Logs supports both managed and custom data identifiers.
Managed data identifiers offer preconfigured data types to protect financial data, personal health information (PHI), and PII. For some types of managed data identifiers, the detection depends on also finding certain keywords in proximity with the sensitive data.
Each managed data identifier is designed to detect a specific type of sensitive data, such as name, email address, account numbers, AWS secret access keys, or passport numbers for a particular country or region. When creating a data protection policy, you can configure it to use these identifiers to analyze logs ingested by the log group, and take actions when they are detected.
CloudWatch Logs data protection can detect the categories of sensitive data by using managed data identifiers.
To configure managed data identifiers on the CloudWatch console, complete the following steps:
- On the CloudWatch console, under Logs in the navigation pane, choose Log groups.
- Select your log group and on the Actions menu, choose Create data protection policy.
- Under Auditing and masking configuration, for Managed data identifiers, select all the identifiers for which data protection policy should be applied.
- Choose the data store to apply the policy to and save the changes.
Custom data identifiers let you define your own custom regular expressions that can be used in your data protection policy. With custom data identifiers, you can target business-specific PII use cases that managed data identifiers don’t provide. For example, you can use custom data identifiers to look for a company-specific account number format.
To create a custom data identifier on the CloudWatch console, complete the following steps:
- On the CloudWatch console, under Logs in the navigation pane, choose Log groups.
- Select your log group and on the Actions menu, choose Create data protection policy.
- Under Custom Data Identifier configuration, choose Add custom data identifier.
- Create your own regex patterns to identify sensitive information that is unique to your organization or specific use case.
- After you add your data identifier, choose the data store to apply this policy to.
- Choose Activate data protection.
For details about the types of data that can be protected, refer to Types of data that you can protect.
Monitor and protect data with Amazon S3
In this section, we demonstrate how to protect your data in S3 buckets.
Encrypt audio recordings in S3 buckets
PII can often be captured in audio recordings, especially in sectors like customer service, healthcare, and financial services, where sensitive information is frequently exchanged over voice interactions. To comply with domain-specific regulatory requirements, organizations must adopt stringent measures for managing PII in audio files.
One approach is to disable the recording feature entirely if it poses too high a risk of non-compliance or if the value of the recordings doesn’t justify the potential privacy implications. However, if audio recordings are essential, streaming the audio data in real time using Amazon Kinesis provides a scalable and secure method to capture, process, and analyze audio data. This data can then be exported to a secure and compliant storage solution, such as Amazon S3, which can be configured to meet specific compliance needs including encryption at rest. You can use AWS KMS or AWS CloudHSM to manage encryption keys, offering robust mechanisms to encrypt audio files at rest, thereby securing the sensitive information they might contain. Implementing these encryption measures makes sure that even if data breaches occur, the encrypted PII remains inaccessible to unauthorized parties.
Configuring these AWS services allows organizations to balance the need for audio data capture with the imperative to protect sensitive information and comply with regulatory standards.
S3 bucket security configurations
You can use an AWS CloudFormation template to configure various security settings for an S3 bucket that stores Amazon Lex data like audio recordings and logs. For more information, see Creating a stack on the AWS CloudFormation console. See the following example code:
The template defines the following properties:
- BucketName– Specifies your bucket. Replace YOUR_LEX_DATA_BUCKET with your preferred bucket name.
- AccessControl – Sets the bucket access control to Private, denying public access by default.
- PublicAccessBlockConfiguration – Explicitly blocks all public access to the bucket and its objects
- BucketEncryption – Enables server-side encryption using the default KMS encryption key ID, alias/aws/s3, managed by AWS for Amazon S3. You can also create custom KMS keys. For instructions, refer to Creating symmetric encryption KMS keys
- VersioningConfiguration – Enables versioning for the bucket, allowing you to maintain multiple versions of objects.
- ObjectLockConfiguration – Enables object lock with a governance mode retention period of 5 years, preventing objects from being deleted or overwritten during that period.
- LoggingConfiguration – Enables server access logging for the bucket, directing log files to a separate logging bucket for auditing and analysis purposes. Replace YOUR_SERVER_ACCESS_LOG_BUCKET with your preferred bucket name.
This is just an example; you may need to adjust the configurations based on your specific requirements and security best practices.
Monitor and protect with data governance controls and risk management policies
In this section, we demonstrate how to protect your data with using a Service Control Policy (SCP). To create an SCP, see Creating an SCP.
Prevent changes to an Amazon Lex chatbot using an SCP
To prevent changes to an Amazon Lex chatbot using an SCP, create one that denies the specific actions related to modifying or deleting the chatbot. For example, you could use the following SCP:
The code defines the following:
- Effect – This is set to Deny, which means that the specified actions will be denied.
- Action – This contains a list of actions related to modifying or deleting Amazon Lex bots, bot aliases, intents, and slot types.
- Resource – This lists the Amazon Resource Names (ARNs) for your Amazon Lex bot, intents, and slot types. Replace YOUR_ACCOUNT_ID with your AWS account ID and YOUR_BOT_NAME with the name of your Amazon Lex bot.
- Condition – This makes sure the policy only applies to actions performed by a specific IAM role. Replace YOUR_ACCOUNT_ID with your AWS account ID and YOUR_IAM_ROLE with the name of the AWS Identity and Access Management (IAM) provisioned role you want this policy to apply to.
When this SCP is attached to an AWS Organizations organizational unit (OU) or an individual AWS account, it will allow only the specified provisioning role while preventing all other IAM entities (users, roles, or groups) within that OU or account from modifying or deleting the specified Amazon Lex bot, intents, and slot types.
This SCP only prevents changes to the Amazon Lex bot and its components. It doesn’t restrict other actions, such as invoking the bot or retrieving its configuration. If more actions need to be restricted, you can add them to the Action list in the SCP.
Prevent changes to a CloudWatch Logs log group using an SCP
To prevent changes to a CloudWatch Logs log group using an SCP, create one that denies the specific actions related to modifying or deleting the log group. The following is an example SCP that you can use:
The code defines the following:
- Effect – This is set to Deny, which means that the specified actions will be denied.
- Action – This includes
logs:DeleteLogGroup
andlogs:PutRetentionPolicy
actions, which prevent deleting the log group and modifying its retention policy, respectively. - Resource – This lists the ARN for your CloudWatch Logs log group. Replace YOUR_ACCOUNT_ID with your AWS account ID and YOUR_LOG_GROUP_NAME with the name of your log group.
- Condition – This makes sure the policy only applies to actions performed by a specific IAM role. Replace YOUR_ACCOUNT_ID with your AWS account ID and YOUR_IAM_ROLE with the name of the IAM provisioned role you want this policy to apply to.
Similar to the preceding chatbot SCP, when this SCP is attached to an Organizations OU or an individual AWS account, it will allow only the specified provisioning role to delete the specified CloudWatch Logs log group or modify its retention policy, while preventing all other IAM entities (users, roles, or groups) within that OU or account from performing these actions.
This SCP only prevents changes to the log group itself and its retention policy. It doesn’t restrict other actions, such as creating or deleting log streams within the log group or modifying other log group configurations. To restrict additional actions, add it to the Action list in the SCP.
Also, this SCP will apply to all log groups that match the specified resource ARN pattern. To target a specific log group, modify the Resource value accordingly.
Restrict viewing of unmasked sensitive data in CloudWatch Logs Insights using an SCP
When you create a data protection policy, by default, any sensitive data that matches the data identifiers you’ve selected is masked at all egress points, including CloudWatch Logs Insights, metric filters, and subscription filters. Only users who have the logs:Unmask
IAM permission can view unmasked data. The following is an SCP you can use:
It defines the following:
- Effect – This is set to Deny, which means that the specified actions will be denied.
- Action – This includes
logs:Unmask
, which prevents viewing of masked data. - Resource – This lists the ARN for your CloudWatch Logs log group. Replace YOUR_ACCOUNT_ID with your AWS account ID and YOUR_LOG_GROUP_NAME with the name of your log group.
- Condition – This makes sure the policy only applies to actions performed by a specific IAM role. Replace YOUR_ACCOUNT_ID with your AWS account ID and YOUR_IAM_ROLE with the name of the IAM provisioned role you want this policy to apply to.
Similar to the previous SCPs, when this SCP is attached to an Organizations OU or an individual AWS account, it will allow only the specified provisioning role while preventing all other IAM entities (users, roles, or groups) within that OU or account from unmasking sensitive data from the CloudWatch Logs log group.
Similar to the previous log group service control policy, this SCP only prevents changes to the log group itself and its retention policy. It doesn’t restrict other actions such as creating or deleting log streams within the log group or modifying other log group configurations. To restrict additional actions, add them to the Action list in the SCP.
Also, this SCP will apply to all log groups that match the specified resource ARN pattern. To target a specific log group, modify the Resource value accordingly.
Clean up
To avoid incurring additional charges, clean up your resources:
- Delete the Amazon Lex bot:
- On the Amazon Lex console, choose Bots in the navigation pane.
- Select the bot to delete and on the Action menu, choose Delete.
- Delete the associated Lambda function:
- On the Lambda console, choose Functions in the navigation pane.
- Select the function associated with the bot and on the Action menu, choose Delete.
- Delete the account-level data protection policy. For instructions, see DeleteAccountPolicy.
- Delete the CloudFormation log group policy:
- On the CloudWatch console, under Logs in the navigation pane, choose Log groups.
- Choose your log group.
- On the Data protection tab, under Log group policy, choose the Actions menu and choose Delete policy.
- Delete the S3 bucket that stores the Amazon Lex data:
- On the Amazon S3 console, choose Buckets in the navigation pane.
- Select the bucket you want to delete, then choose Delete.
- To confirm that you want to delete the bucket, enter the bucket name and choose Delete bucket.
- Delete the CloudFormation stack. For instructions, see Deleting a stack on the AWS CloudFormation console.
- Delete the SCP. For instructions, see Deleting an SCP.
- Delete the KMS key. For instructions, see Deleting AWS KMS keys.
Conclusion
Securing PII within AWS services like Amazon Lex and CloudWatch requires a comprehensive and proactive approach. By following the steps in this post—identifying and classifying data, locating data stores, monitoring and protecting data in transit and at rest, and implementing SCPs for Amazon Lex and Amazon CloudWatch—organizations can create a robust security framework. This framework not only protects sensitive data, but also complies with regulatory standards and mitigates potential risks associated with data breaches and unauthorized access.
Emphasizing the need for regular audits, continuous monitoring, and updating security measures in response to emerging threats and technological advancements is crucial. Adopting these practices allows organizations to safeguard their digital assets, maintain customer trust, and build a reputation for strong data privacy and security in the digital landscape.
About the Authors
Rashmica Gopinath is a software development engineer with Amazon Lex. Rashmica is responsible for developing new features, improving the service’s performance and reliability, and ensuring a seamless experience for customers building conversational applications. Rashmica is dedicated to creating innovative solutions that enhance human-computer interaction. In her free time, she enjoys winding down with the works of Dostoevsky or Kafka.
Dipkumar Mehta is a Principal Consultant with the Amazon ProServe Natural Language AI team. He focuses on helping customers design, deploy, and scale end-to-end Conversational AI solutions in production on AWS. He is also passionate about improving customer experience and driving business outcomes by leveraging data. Additionally, Dipkumar has a deep interest in Generative AI, exploring its potential to revolutionize various industries and enhance AI-driven applications.
David Myers is a Sr. Technical Account Manager with AWS Enterprise Support . With over 20 years of technical experience observability has been part of his career from the start. David loves improving customers observability experiences at Amazon Web Services.
Sam Patel is a Security Consultant specializing in safeguarding Generative AI (GenAI), Artificial Intelligence systems, and Large Language Models (LLM) for Fortune 500 companies. Serving as a trusted advisor, he invents and spearheads the development of cutting-edge best practices for secure AI deployment, empowering organizations to leverage transformative AI capabilities while maintaining stringent security and privacy standards.