Practical AWS Logging in 5 Steps

Logging is probably one of the most crucial areas when a security or operational incident needs to be investigated…. is what most technology practitioners would say. 

But, monitoring logs for malicious or unusual activity, carrying out threat hunting are proactive techniques to help organisations catch infiltrations or compromise early.

This is KEY to reducing the impact of a cyber attack or production issue and hence minimising the cost associated with its aftermath. 

From a security standpoint, there’s a famous quote by John Chambers, Former CEO – Cisco:

“There are only two types of organisations: those that have been hacked and those that don’t know it yet!”

This statement holds true for operational or production issues as well.

Now that we’ve put things in perspective, let’s carve out a log management strategy for a public cloud setup. We will deep dive into five AWS logging best practices that will help answer these questions – what to log, how to log, and how to scale.

Though we’ll mostly be calling out AWS terminology, one can safely assume a parallel component in a different public cloud setup as well.

Asset Classification

Before we start figuring out what to log and what not to, it’s important to get our heads around the criticality or sensitivity of various components within your public cloud deployment, such as compute, storage, data systems etc. 

Why is this important? 

Well, for the simple reason that too less logging associated with credit card systems can lead to non-compliance, at the same time, enabling too much logging on let’s say internal non-critical HTTP services can lead to added costs and noise from a cyber security perspective. 

Use Tags

When you’re working with public cloud deployments, infrastructure is extremely dynamic and it is nearly impossible to maintain static configurations like hosted deployments or network devices. For example, IP addresses change every time an application scales up or scales down. 

This is where tagging your resources basis criticality helps, which can feed as a crucial input to the logging configuration. Thankfully most public cloud providers support tags!

Let’s take an example:

Name - prod-int-customerapi
Classification - sensitive
Environment - production

Just by looking at these tags, one can tell that the resource contains sensitive data, is a production system and is an internal application. More on tagging best practices can be found here.

This strategy can eventually stitch itself into your CI/CD or infra-as-code pipelines. Sounds great, doesn’t it? There’s a lot more!

Naming Convention

This helps you automate your classification approach.  Let’s look at the below resource name:

prod-int-payments

and here’s what we can infer:

Prod -> production

Int -> internal service

Payments -> application name

Cy5’s cloud security products and their Contextual Intelligence capabilities can inspect tags and resource names to automate classification for you!

Data Discovery Tools

To go one step deeper into the classification journey, organisations can adopt automated data classification techniques via tools such as AWS Macie. Such tools analyse data within data stores such as S3 buckets, databases etc, to automatically classify resources depending on the data they contain.

What to log?

Once you’ve got your asset classification strategy sorted, establishing configurations for logging levels comes next. 

Let’s use a simple 3×3 matrix that will help us clearly map out how log levels can be defined, on the basis of data classification and network reachability. 

Take for instance, a publicly available server that hosts a web application – you would want to log every kind of traffic that lands on it. On the other hand, take an internal server that holds credit card data, again – you would want to log everything you possibly can. However, an internal server that hosts data that isn’t really sensitive, well, you might want to log only writes or updates to it.

The devil is in the details. 

Let’s break this down a little now. We would recommend you consider logging against the following services in your public cloud environment.

CloudTrail

Every AWS interaction gets logged in CloudTrail as an event and is a MUST have for any production AWS account for visibility, analysis and incident response. A CloudTrail log consists of the following key elements:

eventName
eventType
eventSource
requestParameters
responseParameters
sourceIPAddress

It enables you to query logs basis few of the above plus some additional fields as shown below.

Users can use the CloudTrail Event History section to quickly investigate operational or security issues.

CloudWatch

CloudWatch is a logging service by AWS that helps it’s customers collect, store and monitor logs from various sources. It is also the de-facto logging service for AWS services. Apart from log storage, customers can create metrics around attributes and can generate alerts when those attributes change or cross certain thresholds.

AWS customers can integrate CloudTrail with CloudWatch to analyse CloudTrail logs at scale and quickly. However, a word of caution here – CloudWatch can turn out pretty expensive when use with large amounts of data. For larger volumes, consider using an Athena table instead.

ALB / ELB Access Logs

Not all load balancers are public, in fact most that you would create would end up being internal facing. Consider enabling ALB or ELB access logging for your public facing load balancers as they would give you meaningful insights into user (or attacker) access patterns, and are crucial when investigating an attack on a public facing application.

Check out this article for more on enabling ALB / ELB access logs. 

CloudFront

For reasons similar to public facing ALB / ELB, CloudFront CDN deployments should log public requests. 

You can have real time logs that contain detailed information about every request on a CloudFront distribution. 

Adding more to this, the standard logging comes as an optional feature with CloudFront service by AWS and is free of cost. Although you’d have to pay for the log retention at destination (S3).

VPC Flow Logs

Your VPC is loaded with network activity all the time, but not all network traffic is as critical for security purposes as others. For example, your internet facing components such as public EC2 instances, NAT gateways, the DMZ; or cardholder data environments (PCI-DSS CDE) should be monitored for network activity regardless. Whereas internal subnets, EC2 instances, might not be as critical and hence depending on the organisation’s risk appetite, one might choose to ignore them from VPC logging. 

Enabling VPC flow logs is as easy as:

S3 Access Logs

Leveraging our classic data classification dialogue, not all S3 buckets might require logging; but it would be crucial to enable access logging for public and sensitive buckets. Our article on S3 security best practices goes deep into other S3 security aspects as well.

API Gateway Trace Logging

Where necessary enable trace logs in your sensitive and public facing API gateways (in case they aren’t associated with CloudFront where logging might already be enabled).

Apart from these, consider enabling logs for your data systems such as ElasticSearch, RDS, DynamoDB etc keeping the 3×3 matrix as a guiding stick.

CloudTrail vs CloudWatch

Like we’ve seen earlier, CloudTrail is a logging service that records AWS activity for investigation and analysis purposes. There are various mechanisms in which AWS CloudTrail logs can be stored. AWS CloudWatch is one such mechanism. 
 
AWS CloudWatch on the other hand, is the standard AWS logging service, where an organisation can store AWS related events or custom events as long as they’re in a json format.

Logging Infrastructure

Logs are great, but without resilient infrastructure backing the log delivery pipeline and data retention / retrieval, they’re dead weight. It’s a complex problem to solve, let’s look at some tried and tested implementation strategies.

Low Latency, Moderate to High Cost

If cost is not a concern, an organisation should go for options such as CloudWatch or OpenSearch. Both services provide highly scalable, low latency log storage; but they’re expensive when large amounts of data are stored. 

Moderate Latency, Low Cost

In case data retrieval latencies are not an issue, consider S3 based storage. There are various ways to store structured data to S3, as json, Apache ORC or Apache Parquet. Data on S3 can be compressed using formats such as GZIP, which provider superior compression ratios.

Security Data Lake

Once your logs reach S3 by either of the above approaches, and they’re in a certain format (json and parquet both work great), one could use AWS Athena to query structure logs just like you were querying a database, yes, using SQL queries! The best part is yet to come, you could use tools such as AWS Quicksight or Redash to quickly build insights, visualisations and dashboards to monitor key metrics or identify patterns to your log data. 
  
Performance considerations:
 
  • Parquet is a columnar data structure and works better compared to json (size and querying efficiency)
  • Try and keep your logging infrastructure as cloud native as possible – by using s3, athena and quicksight
  • Hadoop or big data infrastructure works great, but one needs to manage it
 
A security data lake contains all your logs from all the sources integrated into, let’s say, your SIEM solution. With the combination of proper log processing and storage (depending on the retention period the user chooses), Cyber Threat Intelligence and active Threat Hunting, one can get full visibility and a central place to look for any threat actors.

Conclusion

A logging project can get overwhelming, whether it is trying to figure out what to log, how to log, how to store or how to analyse. We hope this article on AWS Logging was helpful to your logging or threat detection and response charter.