Maximizing AWS Security: Transforming Cloud Strategies for Agile Success — Part 1

Part 1: Introduction.

Ammar Alim

Published in

ITNEXT

12 min readMar 7, 2019

NOTE: — this is Part 1of a multi-part series. For Part 2, please click here.

"Security will forever be our top priority," said Werner Vogels, VP & CTO of Amazon.

Agility, cost, autonomy, and faster time to market is why most customers moved their on prem workloads to the cloud — Let’s figure out how we can move security to the left ← and align it to those software development principles — Security shouldn’t become the bottleneck, You can use the cloud to secure workloads on the cloud.

Target Audience

If you've been working with AWS Cloud for a few years, you can relate to the points I am trying to get across in this post. Please share your experiences with a broader audience. If you're about to start your journey, this is an excellent place to start, as I will summarize some of the practices I've learned over the years.

I assume you have been working with AWS for at least a year. Therefore, I will only explain AWS services mentioned in this post in detail in this part of the series. I will highlight strategies you should implement for better security architecture. Other parts of the series will shed some light on the implementations.

During the last two years, I spent a great deal of time working with my team and other cloud professionals via the many open source platforms in attempts to improve security practices on the AWS Cloud. While it sounds simple on paper, achieving it at production scale is a very challenging task, there is a need to collaborate across various teams and very diverse sets of technology stacks.

Don't worry; we'll cover all the nitty-gritty on this and upcoming posts. You will learn from our mistakes and leverage our wins. Importantly, share your experiences with us.

With that out of the way. Let's roll the DevOps way!

The Culture

Before we get to discuss security implementations on AWS, let's start with us (security folks), our culture, our processes, and the mindset.

"Collectively, we have failed our customers," said Pat Gelsinger, chief executive officer of VMWare.

Pat couldn't be more right about us, security teams. For years, we've focused on threats instead of building security capabilities. Instead of preventing threats, in the first place, security teams spend the majority of their time reacting to threats.

We should discuss implementations that shrink the attack surface and prevention mechanisms. Complexity, slowness, and inter-team conflicts destroyed lots of organizations. Only when you apply simplicity to your processes, a more collaborative spirit to your culture, and solid security principles backed by data to your decision-making engine can you see things moving in the right direction. I also agree with the voices that say security teams are underfunded and always have to fight for the "best bang for the buck." that makes us wiser!

Our teams must be challenged to work together to achieve organizational security goals; in the end, security should be everyone's job. The less time we spend blaming each other, the easier it is for us to collaborate effectively.

The Shared Responsibility Model

I can tell you're wondering where to start. You're trying to catch up with application teams releasing new features or bug fixes daily and have much to worry about. I know it's tempting to jump to the cool stuff and start enforcing policies via automation or other means, but understanding your responsibilities as a customer is where you need to start, so start with the shared responsibility model because you are going to avoid lots of rookie mistakes in the long run.

Depending on your choice of AWS compute service (EC2, ECS/EKS, lambda, etc) or public Cloud model (IaaS, PaaS, or SaaS) your responsibility as a customer could change radically. Still, we won't cover that in detail in this post because this can be an article on its own.

At a high level, though, you should be very anxious about securing your data as you can't offload this responsibility to AWS; it's your obligation no matter what AWS services you use or the public cloud model you adopt.

The more you move toward serverless, the less you worry about patch management and OS-level security hardening nightmares.

The same goes for your public cloud model; the more you move toward SaaS, the less you have to worry about infrastructure security. Please note, that I am not advocating for SaaS, I am just stating facts.

The Multi Account Model

After understanding your security responsibilities as a customer, it's time to think about implementing governance at scale. One of the significant lessons early adopters learned is that deploying your applications on a single AWS account is a dangerous practice as it complicates your cloud management processes from a security and cost management standpoint.

I will let you decide how to architect your accounts, but the diagram above illustrates a familiar pattern.

You need a sandbox account for your developers to try new cool things and formal Dev, pre-Prod, and Prod accounts for each product. Have a logging account where you aggregate all your CloudTrail logs, Config rules, S3 access logs, and other security-related data that auditors may need access to in case of a breach.

You also need a management account to enforce policies on your other accounts. Try to delegate billing to a billing-admin account instead of using your master account. Use the master account to apply organization service control policies on your linked accounts.

Resource tagging is a huge issue, and cost control is one of the significant problems the enterprise faces today. Again, multi-account model to the rescue; since you have deployed each product on a separate AWS account, you know that the "product team" can eat the entire bill, and you shouldn't worry too much about their tagging shortcomings even though I believe tagging is essential for many other reasons.

Many AWS customers who use a single account for all their applications experience the pain of increasing service limits. You have many engineers competing for limited resources (s3 buckets, computer resources, and others). AWS has some hard limits that they can't increase for you.

My all-time favorite benefit of the multi-account model is its limited blast radius. I can ensure Dev Team A does not have access to Dev Team B's accounts so they keep their stuff intact. I can also sleep at night knowing that if my cat's image processing product on AWS account 120 were to get breached, my other products on the other AWS accounts wouldn't be impacted. I think you now see the benefits of using this model :) And it does not cost you more money! AWS accounts are free until you start using resources.

Cloud Enablement

Make sure you do whatever it takes not to block your developers for no good reason. It takes work to balance allowing developers to move fast and ensuring your environments are secure. Start by making sure your developers know that you are not there to be the blocker but rather the enabler.

Establish an agreed-upon security baseline that needs to be on each account for your business to comply with industry-specific regulations.
Anatomy and velocity are essential to your business as much as security so make sure you align your security practices to your business so both work together in harmony, this is easy said than done as it takes lots of evangelism, understanding of your products and infrastructure in addition to all your compliance requirements.

Automation & Policy as Code (PaC)

I don't know about you, but the idea of architecting security policies in code and having them deployed to production within minutes is very cool. You can version control your policies, know who made a lousy merge, and what the last good version was to put to good use.

Remember when developers had to wait five weeks for you to approve that firewall request? Well, those days are gone. Developers can now create security group rules in a few seconds and open port 22 to the world in less than that.

Developers never cared about your tedious documentation filled with outdated security policies; they never read them and will never do so, even if you spent your valuable time updating them.

So what should you do? If you're considering automating your security policies, you should pat yourself on the shoulder. To keep up with speed, the cloud provides your business security policies, which need to be as elastic as your infrastructure. It would be best to move as fast or slow as your application teams.

There are many ways to define policies on AWS, AWS organization's service control policies, IAM policies, and service-specific policies such as KMS keys and bucket policies.

Automation enables that flexibility, and codifying your policy is incredible since you can treat your security policies as your infrastructure and application code. Policies can be checked in git, revised, improved, refactored, and looked at by multiple people, and everything that applies to the application code can be applied to your policy code.

You can use services such as Lambda and CloudWatch events or leverage open-source tools such as Cloud Custodian to help you achieve compliance via automation.

The point is, automate your security controls or be prepared to lose this battle.
Real-time detection, notification, and remediation are very cloud and human-friendly.

If one of your developers opens an s3 bucket to the world during a PRD deployment, detect that, notify them immediately that you have detected the issue and fixed it for them. If you discover security findings two weeks after that product went to production, convincing the builder to fix it would be challenging. Real-time feedback is well received as it can be acted upon immediately.

Identity and Access Management

AWS IAM is the heart of AWS security. If you talk to IAM ninjas, they will tell you that getting AWS IAM right is a game changer as it is one of AWS most potent services.

AWS IAM enables you to define fine-grained policies that help you quickly set boundaries and implement least privilege principles. IAM integrates with all AWS services, even services that allow you to establish policies directly on them, such as s3 and KMS.

The key with IAM is to use roles and assume those roles instead of using API keys everywhere. Also, automate the creation of policies, groups, and roles as much as possible and have a well-defined naming conversion for your policies, roles, and groups to avoid configuration drifts. Understanding conditions and IAM actions can be tricky, but it is indispensable if you try to have a firm handle on your access policies.

My advice for you is, please avoid a single AWS account model so you can avoid slicing and dicing policies.

Data Security

Securing data by encrypting data at rest and in transit and making sure only those who need access to the data get access to it is critical. Even though that seems straightforward, many organizations need help understanding it.

I often start my day by Googling S3 buckets data leakage and sharing my findings with people in my circle to scare everyone. I don't recall going more than a couple of days without running into that kind of news.

The lesson I learned is to classify your data early on and make sure all your data stores have owners. You should also encrypt everything and make sure all data stores are protected.

KMS is incredible as it does all the heavy lifting for you, creating master keys, data keys, and key rotation, which are all managed for you by AWS. If you want more control, you can bring and manage customer keys.

Infrastructure Security

As AWS handles the physical infrastructure security for you, you can focus on securing your VPC traffic and resources within the VPC. You start by creating private and public subnets using VPN or bastion hosts to access your servers. Using NACLs and Security groups to restrict traffic is a must implement best practice.

Avoid using protocols such as SSH by leveraging immutable infrastructure because it provides greater security.

Don’t patch instances just rebuild a new AMI, don’t update in place, just destroy and build from a base image.

Use AWS git-secrets to detect keys that are accidentally checked into git repositories. Use EC2 roles instead of API keys; remember your EC2 keypairs if that leak; rest assured, your servers will be mining Bitcoin in no time.

Suppose you are using s3 for static web hosting. In that case, I recommend fronting your bucket with a CloudFront distribution and leveraging AWS origin access identity to restrict your bucket to your distribution. This pattern hides your objects from bad actors and gives you access to security features such as AWS WAF and Shield, which can't work with s3 directly.

Resilience

High availability, continuity of operations, robustness and resilience, and disaster recovery are often reasons for cloud deployments with AWS. Multi-AZ and MultiRegion deployments are all patterns we need to adapt according to AWS's well-architected framework.

Tools such as the AWS Well-Architected Tool can help you deploy more resilient workloads on AWS. AWS makes it easier for you to architect for failure; most services offer high availability and durability, and others can be architected to be more resilient. AWS recommends that we architect with four pillars in mind; security, reliability, cost optimization, and performance.

After you finish the design and implementation phase, you can test your infrastructure by introducing Chaos Engineering to grow confidence in your infrastructure and define processes that can help you recover quicker in the event of a failure. You can start your tests with individual nodes and AZs or take down an entire region and see how your application copes with that.

Logging & Monitoring

CloudTrail, Config, and VPC flow logs must be enabled and aggregated to a logging account. You should also ensure those services are enabled in each region and get notified via automation if anyone tries deactivating them.

AWS guardduty and SecurityHub should also be enabled; both offer outstanding monitoring capabilities. Passive monitoring isn't enough, and we should aim to react to security events with real-time automation.

Incident Response

Speaking of real-time automation, we should possess a playbook that helps guide security professionals and stakeholders within our business to respond to security events more effectively.

Not knowing who owns the data, what to do about a compromised instance, or, even worse, a hijacked AWS account is terrible. We need to automate as much as possible, but we also need defined responsibilities and documented processes that we can reference. Writing about this subject would be enjoyable, so I will stop here and work on a blog post soon!

DevSecOps

This is a buzzword that gets thrown around a lot, like DevOps. DevSecops or SecDevOps is implementing security on your CI/CD workflow. Code gets scanned and checked for security vulnerabilities as it moves to production through all lower environments.

We don't wait for the security team to bless our code at the last minute before it goes to production. We start with security as early as having IDE plugins that look for security bugs as we code. Using tools like Docker Bench and other easily integrated tools with Jenkins is an excellent place to start.

DevSecOps is a way of approaching IT security with an “everyone is responsible for security” mindset. It involves injecting security practices into an organization’s DevOps pipeline. The goal is to incorporate security into all stages of the software development workflow. That’s contradictory to its predecessor development models — DevSecOps means you’re not saving security for the final stages of the SDLC.

Compliance Validation

Do you hold customers credit card information? Health care information? Or any personally identifiable information (PII)? It would be best to see how you're doing when you check your infrastructure against industry-specific best practices.

You can only do business if you comply with industry regulations and follow the rules. We can use AWS Config rules to audit our infrastructure continuously. At the minimum, we should start with the CIS benchmarks if we don't have to comply with other frameworks such as PCI DSS or HPPA.

That's it for the Introduction. Thank you for spending your valuable time! If you like to see more of this, please 👏 so others can see it.

In the coming articles, we will explore the next level of complexity!