Running your infrastructure in a secure configuration is a daunting task even for security professionals. This guide provides practical advice to help engineers build up infrastructure following security best practices so that they can confidently deploy their services to the public Internet and lower their chances of being compromised. This guide specifically targets Linux based systems; however, the best practices apply to all computer systems.
Part of confidentially running infrastructure is understanding what and whom you are protecting your infrastructure against. This guide will eventually have three versions, Basic, Intermediate, and Advanced, with each version focused on defending your infrastructure against a different class of attacker. You are reading the Basic version which aims to protect against automated attacks and script kiddies that understand exploitation tools rather than exploitation techniques. This class of attacker is opportunistic rather than targeted and quickly moves on to easier targets. If you are running a side project or starting a company, this is the best place to start and will help build a solid foundation to build upon.
While reading this guide, consider the type of attacker and types of attacks you want to defend against. The best practices that you follow and do no follow depend on what you are trying to defend and whom you are trying to defend against.
Approach to Security
This guide follows these guiding principles in its discussion of software security:
- Defend, detect, and react. This means apply good security practices to defend your infrastructure, but log all suspicious behavior, and when compromised, restore to a safe state.
- All software can be exploited. All non-trivial software has flaws that allow an attacker with enough motivation to exploit it.
- Simplicity is security. Overly complex systems become harder for the developer to reason about and easier for an attacker exploit. Simpler systems that can be reasoned about are often more secure. Don't roll out a security solution that you do not understand.
- Obscurity is not security. Rely on the security of the protocols you use to defend your infrastructure, not obscure ports and other tricks to try and hide insecure protocols.
- Consider all user input hostile. Consider all input accepted from users as hostile, and strictly verify what you accept.
- Principle of least privilege. Provide the minimal privilege needed for some operation to occur. If a process or system is exploited, you don't want to allow an attacker to gain any more access than is minimally required.
Aggressively applying security updates for software you didn't write might seem like a poor way to protect your infrastructure and perhaps even pointless. However, it's one of the best time investments you can make, from a security perspective. Following are two examples of recent security issues that unsophisticated attackers using automated tools can exploit if you have not updated your servers with the latest security patches:
Heartbleed: Allows an attacker to steal your private certificates and decrypt encrypted traffic
Shellshock: Allows an attacker to remotely execute arbitrary code on your servers
These two issues alone would give an attacker complete control of your entire infrastructure. Luckily mitigating these bugs is not difficult.
Consistently apply security updates provided by your operating system vendor. Most vendors have an automated method. For example, for Debian based systems, you can use Unattended Upgrades, and for Red Hat based systems, you can use AutoUpdates.
Automated patching is great, however it does have a potential downside (to your business) if you don't test your software before you apply patches to production servers: things can unexpectedly break. As much as package maintainers try and ensure that security updates don't contain breaking changes, package maintainers can not test every combination that may be running somewhere before release. Thats why it's important to either have a staging Continuous Integration/Continuous Deployment (CI/CD) system or manually test security updates before rolling them out to production servers.
Just applying these security updates is not enough, however. If the issue is in a shared library, you will be using the old version of the library and still be vulnerable to exploitation until you restart the process that is linked to it. To check whether you have any binaries that need to be restarted, you can use checkrestart for Debian based systems and needs-restarting for Red Hat based systems.
- DO: Patch your servers against the latest security vulnerabilities.
- DO: Use automatic updates from your OS vendor whenever possible.
- DO: Restart any services that rely on shared libraries that have been updated.
- DON'T: Roll out updates to a server without running tests.
Hardening your application by using OS-level facilities is an effective approach for limiting the scope of damage that attackers can do after they exploit a vulnerability in your application. This section focuses on using traditional Unix Access Control facilities that most users are familiar with to restrict your application to the minimal set of access it needs to operate. The facilities are permissions on files, user identifier (UID), and root access.
The goal of this section is not to harden your application to the extent that an attacker can not compromise your application. That is an almost impossible goal. To goal is to limit what an attacker can do once your application has been compromised. After an attacker has exploited your application, the attacker will be able to perform actions as your application, and possibly even elevate their privileges to root which allows them to have full and complete access to your operating system. The goal is instead to restrict the actions that your application can perform to the limited set that it needs to operate, which in turn restricts the attacker.
You want to restrict your application such that even if an attacker has exploited your process and can execute code as that user account, the user has limited access rights on the file system. The same concept applies to the process under which the account is executing: restrict the CPU time, memory, and file descriptor count to mitigate DOS style attacks where the attacker exhausts your resources. The goal is to force the attacker to use a privilege escalation attack (exploit another part of your operating system to elevate their privileges higher than the running application) to do anything meaningful on your system.
To restrict the account on which your application runs, use the following guidelines:
- Never run your application as root or as a user that has sudo capabilities. If your application is exploited, this effectively means the attacker can gain root privileges.
- If you have multiple applications, and each accesses different sensitive data, consider running each under its own account and then using file system privileges to isolate sensitive data access from each other. This means that sensitive application data should never have other permissions set to allow anyone read and write permissions. For example, never set permissions to a value such as
0777; instead, use a value like
Ensure that both the application user and group have limited privileges. This means creating a new limited user and group for the account and not giving the user a shell. Suppose you have an app called
foo. Create a user called
fooappmake its home directory
sudo useradd -r -s /bin/false --home /var/appdata/fooapp fooapp sudo mkdir /var/appdata/fooapp sudo chown fooapp:fooapp /var/appdata/fooapp
Daemonize your application so it will automatically start as a particular user. There are two general approaches to solve this problem. The first is to use operating system facilities (like System V init scripts (Red Hat /Debian) or systemd (Red Hat /Debian) to start and stop your application and then use a process monitoring tool (like monit) to restart your application if it crashes. The other approach is to use a process control system (like supervisord, skarnet s6, daemontools) which launche your application as it's child and will also restart your application if it crashes. Both approaches are completely fine and which you use depends which fits your workflow better.
To restrict the process that runs your application, use the following guidelines:
Assign per-process limits by using the
/etc/security/limits.conffile. For example, if you want to limit the number of open file descriptors to 10 and limit memory to 1 GB, add the following lines to the
fooapp hard nofile 10 # 10 open file descriptor limit fooapp hard as 1000000 # 1 GB limit
Don't bind your application to a low port. Typically you must run your application with administrative privileges to do this. Instead, bind to a high port number and use a reverse proxy to forward your requests to your application. Then use Linux capabilities to allow your reverse proxy to bind to a low port without any other privileges. For example, if you have a reverse proxy in
/opt/rproxy, you can set its capabilities as follows:
setcap 'cap_net_bind_service=+ep' /opt/rproxy
Lastly, consider using
chroot, but be aware that there some maintenance overhead is required.
chrootallows you to limit the scope of what a process can see on the file system; specifically, it changes your root directory to a directory of your choosing. For example, if you define
/var/chrootas your new root directory, processes see files under
/. Although this is more secure, it means that any shared libraries that your process might use must either be copied over and reside within
/var/chroot, which in turn means that whenever you apply security updates, you also need to re-copy any updated shared libraries. You can avoid this maintenance with hard links, but then you are offering a path outside that an attacker can potentially exploit. Other approaches (cgroup-based approaches) that you can take to gain similar benefits will be discussed in the intermediate version of this guide.
- DO: Create a restricted account to run your application, which means no shell and limited file system access.
- DO: Bind your application to a high port allowing you to run as a non-privileged user.
- DO: Use capabilities instead of root whenever you can.
- DON'T: Use chroot unless you are ready to take on the maintenance overhead.
Firewalls and Networking
Strong firewall rules enable you to define what inbound and outbound communication is allowed from your servers. Starting with a default deny policy and allowing only specific traffic in and out forces you to think about the minimal set of services that you want to expose, which in turn can lower your risk of attack. An errant process cannot expose your entire infrastructure to the general public unless you specifically allow it to.
This section focuses on inbound firewall rules and TCP/IP stack settings. Although outbound firewall rules are very effective in limiting how far an attacker can go after they have gotten inside your infrastructure, the next version of this guide will focus on them.
First firewall rules. When building a script for firewall rules, use the following guiding principles.
Delete existing firewall rules. When developing firewall rules, you want a coherent idea of what you are blocking and allowing. Dropping all existing rules and starting from scratch accomplishes that.
Set the default rule for inbound traffic to be DROP. This follows the principle of least privilege. After you define the default policy to DROP, then you can slowly open up your network, piece by piece.
Allow free access to the loopback interface. Unlike external interfaces, binding your process to localhost is usually good for security, and therefore restricting access to the loopback interface causes more harm than benefit. This does leave you open to an attack from a local user, but that's a risk you have to balance for yourself.
Don't terminate any established connections. You want to prevent terminating your own SSH connection to a server, and ensure that any ongoing request can finish before being terminated.
Don't restrict all Internet Control Message Protocol (ICMP) traffic. Allowing ICMP is critical for the Internet to work; routers and hosts use it to communicate critical information like service availability, packet sizes, and host existence. Types 3 and 4, Destination Unreachable and Source Quench, are critical, and restricting them will cause more harm than gain in the future. If you are concerned about allowing an attacker to map out your network, a sensible middle ground is to first rate limiting all ICMP traffic, and then allowing a limited subset of ICMP traffic at your edge hosts while allowing unfettered access for internal host-to-host communication.
Apply basic security checks. Some inbound traffic serves no legitimate purpose; restrict that traffic. If you are frequently attacked by a particular type of traffic, it might be helpful to transform this into its own chain, if you find yourself frequently adding rules to this section.
- Unless you are actually using IPv6 and plan to build firewall rules for IPv6 traffic, restrict all inbound IPv6 traffic.
Following is a commented script that accomplishes all of these goals:
#!/bin/bash IPT="/sbin/iptables" # flush old rules, old custom tables $IPT --flush $IPT --delete-chain $IPT -t nat --flush $IPT -t nat --delete-chain $IPT -t mangle --flush $IPT -t mangle --delete-chain # set default policies for all three default chains $IPT -P INPUT DROP $IPT -P FORWARD DROP $IPT -P OUTPUT ACCEPT # enable free use of loopback interfaces $IPT -A INPUT -i lo -j ACCEPT $IPT -A OUTPUT -o lo -j ACCEPT # leave established connections open (like current ssh) $IPT -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT # security checks: force SYN checks, drop all fragments, drop XMAS packets, drop null packets # see: http://security.blogoverflow.com/2011/08/base-rulesets-in-iptables/ $IPT -A INPUT -p tcp ! --syn -m state --state NEW -j DROP $IPT -A INPUT -f -j DROP $IPT -A INPUT -p tcp --tcp-flags ALL ALL -j DROP $IPT -A INPUT -p tcp --tcp-flags ALL NONE -j DROP # allow icmp $IPT -A INPUT -p icmp -m icmp --icmp-type echo-request -m limit --limit 1/second -j ACCEPT $IPT -A INPUT -p icmp -m icmp --icmp-type fragmentation-needed -m limit --limit 1/second -j ACCEPT $IPT -A INPUT -p icmp -m icmp --icmp-type source-quench -m limit --limit 1/second -j ACCEPT # allow ssh $IPT -A INPUT -p tcp --dport 22 -m state --state NEW -s 0.0.0.0/0 -j ACCEPT
Following is a small script for IPv6 traffic:
#!/bin/bash IPT="/sbin/ip6tables" # flush old rules and custom tables $IPT --flush $IPT --delete-chain # drop all traffic $IPT -P INPUT DROP $IPT -P FORWARD DROP $IPT -P OUTPUT DROP
These rules are now running in memory, and you need to ensure that they are loaded the next time your operating system restarts. For Debian based systems, that means either adding your firewall rules to /etc/network/ip-pre-up.d/ or adding a pre-up command to /etc/network/interfaces. For Red Hat systems, this is typically done by using the /sbin/service iptables save command.
In addition, the following TCP/IP stack hardening/tuning is recommended:
- Increase the if you are using stateful firewall rules, like the preceding example, be sure to increase the maximum number of connections that you can track. Otherwise, an attacker can use a distributed denial-of-service (DDoS) attack on you.
- Use SYN Cookies to prevent SYN flood DoS attacks. Thomas Pornin provides a great explanation of what SYN flood attacks are and how SYN cookies mitigate this type of attack.
- Log all martian packets because any packet coming from an unroutable source or destination address is most likely going to be malicious.
You can try out all the above settings with the following script:
#!/bin/bash # to view any of these settings use # sysctl -n x=y # increase connection tracking sysctl -w net.netfilter.nf_conntrack_max=16777216 # enable syn cookies sysctl -w net.ipv4.tcp_syncookies=1 sysctl -w net.ipv4.tcp_synack_retries=5 # log martian packets sysctl -w net.ipv4.conf.all.log_martians=1 sysctl -w net.ipv4.conf.default.log_martians=1
To persist these settings across a reboot, update
net.netfilter.nf_conntrack_max = 16777216 net.ipv4.tcp_syncookies = 1 net.ipv4.tcp_synack_retries = 5 net.ipv4.conf.all.log_martians = 1 net.ipv4.conf.default.log_martians = 1
- DO: Deny traffic by default. Explicitly allow only traffic that you know must traverse your network.
- DON'T: Unilaterally restrict ICMP.
- DO: Allow free access to the loopback interface.
- DO: Force some basic security checks.
- DO: Ensure that your rules are loaded on restart.
- DO: Tune your TCP/IP stack to increase the number of connections tracked and protected against SYN floods.
For remote login, you want to ensure not only that communication with your servers is encrypted but also that only authorized users have access to your servers. Following are typical goals when you are securing remote login:
- Give limited access to users so a compromise of one user account does not compromise your entire infrastructure.
- Strong cryptography ensures that an eavesdropper can’t read your communication.
- Attackers can’t use brute force techniques to log in to your servers.
- Even if your key is compromised, an attacker can’t gain access to your infrastructure.
- An attacker who does use brute force techniques can’t exhaust server resources.
- Only authorized users have access to your servers.
- No login for general purpose administrative accounts exists. All administrative actions are taken via some form of privilege escalation (
sudo) to log the actions performed.
Failure to realize any one of these goals can be a security risk. Weak (or no) cryptography can allow an attacker to view your communication. Weak authentication can allow unauthorized users access to your systems.
Luckily, Secure Shell (SSH) mitigates most of these risks, and with a few minor tweaks to your systems, all of them can be mitigated.
To start with, generate your SSH key correctly by ensuring that you are using a key size that is large enough and your key is passphrase protected. You can do that by using
ssh-keygen -t rsa -b 4096 -C [email protected]
Then when prompted, enter a passphrase! A passphrase ensures that even if someone steals your key they cannot use it without also knowing your passphrase.
OpenSSH has a reasonable default configuration that is quite secure. However, some distributions might weaken these defaults to make OpenSSH interoperability with legacy servers. The following configuration simply ensures that those reasonable defaults are applied by your version of OpenSSH. For more detailed information about OpenSSH configuration, see Mozilla's Configuration guide for OpenSSH and the Securing SSH page for CentOS. Both are excellent resources and we'll build on those configurations in future versions of this guide.
On the server, ensure that you have the following lines in the
Protocol 2 PasswordAuthentication no PubkeyAuthentication yes PermitRootLogin no LogLevel VERBOSE
This configuration achieves the following goals:
Protocol 2 ensures that you are using a secure version of the SSH protocol. Version 1 of the protocol has quite a few issues with it and is considered broken.
PubkeyAuthentication yesforce you to use public key cryptography, and not passwords, to authenticate to your servers. Although you can have a strong password, if you have randomly generated 2048-bit password that's encoded in ASCII, most passwords are bad and password lengths that are commonly used have a much smaller search space than a large key.
PermitRootLogin nodisables the ability remotely log in as the root user. Although this is not a directly exploitable issue, disabling this remote login helps you keep good audit logs to understand what is happening on your servers. The root account acts as a shared administrative account, which limits your ability to audit which user is performing which privileged action. If you force all users to go through their own accounts, you will have an auditable trail of which user performed what action. Details about how to set up audit logging are provided in a later section.
LogLevel VERBOSElogs the user and key fingerprint that made an attempt to authenticate. Again, this setting does not directly mitigate an exploit, but is good for auditing.
On the client, ensure that you have the following lines in the
Host * Protocol 2 HashKnownHosts yes StrictHostKeyChecking ask
This configuration achieves the following goals:
- Protocol 2 ensures that you are using a secure version of the SSH protocol. Version 1 of the protocol has quite a few issues with it and is considered broken.
HashKnownHosts yeshashes host names and addresses in your
~/.ssh/known_hostsfile. Even if an attacker steals your known hosts file, they can’t simply enumerate the hosts you connect to with your key.
StrictHostKeyChecking askchecks the key presented to you against the one in your
~/.ssh/known_hostsfile and, if it has changed (or it is the first time you are visiting that host), asks you if you will accept that key. This helps mitigate man-in-the-middle attacks.
Lastly, give users limited access to your infrastructure. For example, not all users need access to your backup servers, only give access to the users that actually know how to restore from backups. This ensures that even if a non-backup capable user account is compromised, the integrity of your backups is not in question.
To accomplish this, two of the common approaches are:
- Local user accounts. In this approach, you create local Unix accounts for your users and only create the accounts on the servers that they need access to. You can use the
userdelcommands to accomplish this and automate/orchestrate this using configuration management tools like Chef or Ansible.
- Centralized authentication service like LDAP. With this approach, the servers that each user has access to is defined and stored within the LDAP server configuration.
- DO: Passphrase protect your key.
- DON'T: Assume that your distribution has acceptable defaults and strictly define what is important for your infrastructure.
- DO: Use public-key-based cryptography for authentication instead of password-based authentication.
- DON'T: Give all users unfettered access, scope access to the parts of your infrastructure that is needed.
Trust boundaries are a common place where security vulnerabilities occur. The boundary between the outside world and your internal infrastructure is sacred, and you should do as much as possible to defend it and ensure that only authorized users can traverse it.
You should be concerned about two main trust boundaries. The first is the boundary between the public Internet and your API endpoint; this is the boundary that your customers will cross every day when using your service. The second is an access point for your developers and system administrators that will be used to deploy and service your application.
For the API trust boundary that your customers will traverse, you want to consider all user input as hostile and assume that every request that is made is actually an attempt to exploit your infrastructure. When you think about user requests in that manner, it becomes clear that you need to minimize the attack surface that you provide your users and isolate the damage that can occur when a user does eventually exploit your service.
For the trust boundary that you cross to service your applications, you want to isolate all your services so that they are not exposed to the public Internet and then force users who access them from the Internet to traverse a well-guarded access point that you can defend (let's call this a bastion host). You can concentrate all your resources on that one access point and be less concerned about how your services communicate with each other when they are within that trust boundary.
To mitigate these trust boundary issues, you need envision the distinct boxes into which you can place the public Internet and your infrastructure (see the following diagram). After you have that, you can start to think about how you can defend your infrastructure.
The first large box is the public Internet. You should never trust anyone or anything on the public Internet. In fact, you should consider all actors on the public Internet as hostile even when they are your own employees SSH'ing into your servers.
The second large box is your internal infrastructure. These are your trusted hosts. Services that run on these servers should listen only on private network interfaces if possible and not be directly exposed to the public Internet.
The two boxes that span both are you jump host and your API hosts. These hosts should have access to both the public Internet and your internal infrastructure. Because they are directly exposed to the public Internet, they should be hardened and run the minimal set of services that are required to execute their tasks.
Strengthening the API Endpoint
Although we can't expect any service to be bug free, we can limit how much an attacker can exploit your infrastructure if they do exploit your service. That's why we recommend isolating the services that accept inbound requests and running them on their own dedicated servers.
You can accomplish this by splitting up incoming requests into two parts: load balancing and Transport Layer Security (TLS) termination of incoming requests, and the handing of the request by your service itself. Both should be, at the very minimum, their own processes, if not run on different servers, with the load balancing and termination of TLS on the Untrusted/Trusted boundary. This section will focus on load balancing and termination (application hardening was covered in a previous section).
When you separate load balancing and TLS termination from your application, you are limiting the possibility of a bug in either your load balancer or TLS software from escalating into exploitation of your entire application, which will typically have sensitive information loaded in memory. It also gives you a single point of maintenance (and failure) to patch when a vulnerability is found and you need to upgrade your TLS library, which is becoming an increasingly common task.
For example, let's say an attacker has a Remote Code Execution (RCE) bug like GHOST or an arbitrary memory read bug (like Heartbleed) in your HTTP server or TLS software. If your HTTP server, TLS termination, and application logic are all within the same process, a bug in any one of them gives the attacker access to sensitive information within the other parts. For example, a bug in OpenSSL can give an attacker access to sensitive keys that your application has loaded in memory. Conversely, a bug in your application can potentially give an attacker access to your SSL certificates. However, if you separate these parts out, if one is exploited, the other is not, and you lose only some sensitive data.
For load balancing, common choices are NGINX, HAProxy, and Apache. TLS termination is typically done with OpenSSL; however, alternatives like LibreSSL and Mozilla NSS exist. Another alternative is to use something like vulcand, which acts as a load balancer and uses the Go TLS library for TLS termination.
Strengthening the Service Endpoint (and Everything Else)
You can strengthen your service endpoint by restricting access to your servers from the public Internet and forcing all authentication to go through a jump host. This restriction is typically achieved by not directly exposing your infrastructure to the public internet, and instead building some kind of internal network that can be accessed only through the bastion host.
There are many ways to build an internal network, and your approach will largely depend on how your infrastructure is configured by your service provider and your preferences.
For example, say you host your servers on Amazon Web Services (AWS). Then you can start with a Virtual Private Cloud (VPC) with a single public subnet. Your servers will be isolated from other servers on AWS and will reside within their own
10.0.0.0/16 CIDR block. However, they will still have unfettered access to the Internet. To restrict access from the Internet create Security Groups that both isolate the ports that are open as well as the servers that can access those ports. For example, you would configure your worker hosts to accept connections on ports 22 and 80, but only from your jump host and load balancer respectively. Your jump server however would accept connections on port 22 from any server on the public internet.
If your service provider does not provide these tools, you can accomplish the same things as long as they support some kind of private network, either shared or dedicated, that allows you to isolate public and private traffic. This feature is typically offered by most vendors: as mentioned before Amazon calls it VPC, Rackspace calls it ServiceNet, and Digital Ocean calls it Private Networking. All offer essentially the same ability: when you build your virtual server, you can bind it to the public interface, the private interface, or both. If your service provider does not have this ability at build time, you can enable and disable these interfaces yourself in the
/etc/network/interfaces file on a Debian based system and in the
/etc/sysconfig/network-scripts/ifcfg* file in a Red Hat based system.
Once you have servers that have public and private interfaces, use
iptables to restrict inbound traffic to publicly accessible interfaces to the servers that either act as jump hosts or run the publicly accessible API. The servers that handle all internal services, like your application and database server, disallow any inbound traffic to public interfaces, and restrict inbound traffic on the private interfaces to the trusted set of servers.
Once you have accomplished this, the only way for an attacker to exploit your infrastructure from the public Internet is to enter from your hardened bastion host or exploit your API in some manner.
Finally, to access these servers, don't use
ssh-agent; instead, use
ssh-agent has its purposes, it is not good for this particular use case. If you used it, anyone who has a local privilege escalation exploit for your bastion server could access any server on your infrastructure by impersonating anyone whose keys are currently loaded into memory by
ssh-agent. By contrast, with
ProxyCommand, your keys will not be left in memory for someone to steal, and your private key will live exclusively on your local workstation; only your public key will be copied over to each server you need access to.
ProxyCommand, copy your public key to
~/.ssh/authorized_hosts to all the server you need access to. Then on your workstation, update your
~/.ssh/config file with the following information:
Host jump.example.com HostName 22.214.171.124 # public Host lb.example.com Hostname 10.10.10.4 # private ProxyCommand ssh -W %h:%p jump.example.com Host server1.example.com Hostname 10.10.10.5 # private ProxyCommand ssh -W %h:%p jump.example.com Host server2.example.com Hostname 10.10.10.6 # private ProxyCommand ssh -W %h:%p jump.example.com
This configuration allows you to access
workstation.example.com via SSH on their private interfaces by "jumping" through jump.example.com, which has access to both public and private interfaces. All you have to do to connect is to type
ssh server1.example.com or
- DO: Use a load balancer to separate your application HTTP server from your public-facing HTTP server.
- DO: Terminate TLS at the load balancer.
- DO: Use a hardened jump host to control access to your infrastructure.
- DON'T: Use SSH agent, if possible. It leaves all your keys possibly exposed if someone has a local privilege escalation exploit.
- DO: Use the public and private interfaces that your service provider provides to isolate your internal services from the public Internet.
Monitoring and Logging
Every security measure can and will be circumvented at some point in time. Because no practical security measures can provide ironclad security guarantees, it's important to have strong monitoring and logging facilities to help you understand where and how your systems were compromised. The better you understand how and what is running on your systems, the better you will be at detecting anomalous behavior. The same way a bank installs security cameras even though it secures its vaults, having good monitoring tools is critical to catch those clever hackers that have defeated your security measures.
Monitoring and logging take two forms. The first is live monitoring, which enables you to see what is happening on your system at any given moment. This encompasses everything from the network sockets that are open to the processes that are currently running. The second is the log data of actions that have already been taken. This covers everything from application logic logging to system logs.
In this section we will cover looking at system logs on individual servers themselves. In the intermediate version of this guide, we will talk about log aggregation and alerting.
The following tools come bundled with most UNIX based operating systems. They are useful to use when you suspect that a security incident is occurring, but are critical to use beforehand as well, so that you can understand their normal output.
Following are some command commands and their expected output under normal operating conditions. These examples illustrate what the output should look like on your bastion server.
who- Shows you who is logged in at the moment.
$ who foouser pts/0 2015-07-07 13:54 (10.10.10.10)
last -a- Shows you a list of the last few logged in users. Prints the username, logged in time, as well as the IP addresses logged in from.
$ last -a | head -n 10 foouser pts/20 Tue Jul 7 17:47 still logged in 10.10.10.10 baruser pts/19 Tue Jul 7 16:58 - 17:53 (00:54) secure1.example.com bazuser pts/18 Tue Jul 7 15:55 still logged in secure2.example.com
netstat -plntu- Shows process names and the ports they are listening for connections on.
$ sudo netstat -plntu Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 1234/sshd tcp6 0 0 :::22 :::* LISTEN 1234/sshd
netstat -ap- Shows a live stream of all connections including established outbound connections.
$ sudo netstat -ap Active Internet connections (servers and established) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 1234/sshd tcp6 0 0 :::22 :::* LISTEN 1234/sshd tcp 0 0 10.10.10.11:http 10-11-example.com:13012 SYN_RECV - tcp 0 0 10.10.10.12:http 10-12-example.com:35076 SYN_RECV -
find / -mtime -1 -ls | head -n 20- List top 20 files modified within the last 24 hours.
$ sudo find / -mtime -1 -ls | head -n 20 3 0 crw--w---- 1 foobar tty Jul 9 12:58 /dev/pts/0 7 0 crw--w---- 1 root tty Jul 9 12:58 /dev/pts/4 6319 0 crw-rw-rw- 1 root tty Jul 9 12:58 /dev/ptmx 5311 0 crw-rw-rw- 1 root tty Jul 9 12:58 /dev/tty 1187826 4 drwxr-sr-x 38 man root 4096 Jul 9 06:39 /var/cache/man [...]
faillog -ato see a summary of login failures. This is also useful to limit the number of maximum failed logins that a user has.
$ faillog -a Login Failures Maximum Latest On root 0 0 12/31/69 19:00:00 -0500 daemon 0 0 12/31/69 19:00:00 -0500 bin 0 0 12/31/69 19:00:00 -0500 sys 0 0 12/31/69 19:00:00 -0500 [...] foobar 0 0 12/31/69 19:00:00 -0500
tcpdump -i eth1 -s 0 -A tcp port http- Dumps all HTTP traffic on interface
eth1. This is useful if you have found something suspicous using
netstatand want to dig in deeper. This guide can not give you all the ins-and-outs of
tcpdump, but there are a variety of resources on the internet to help you understand
Following are a few general rules for how to handle application logging and pointers to important system logs.
General Application Logs
Aggregate your application logs to a central location, be that a single log file or directory. The common approach to this is to use syslog for this ability. Using syslog makes shipping the logs to a central logging server easier in the future.
Keep your logs as long as disk space allows. Keeping logs for up to 90 days on disk is not unreasonable if you have the space for it.
Like with live monitoring, it's a good idea to watch the following system log files on a regular basis to develop a good baseline of expected output. Having a baselines makes spotting suspicious behavior that much easier in the future. Following is a partial list of interesting system logs to watch:
/var/log/auth.log- System authentication logs.
/var/log/syslog- If you are not sending logs to a particular syslog facility, they will be located here.
/var/log/messages- General system log messages.
~/.bash_history- List of Bash commands that were executed by the user. This log can be easily manipulated or wiped by sophisticated attackers.
/var/log/wtmp- These logs contain the current logged-in users and the history of all logged-in users. Use
last -f <file>to view these files.
- Learn a few basic commands that can help you figure out what is currently happening on your servers. These commands include
- Figure out what you need to log. Logs are not useful if they don't capture security critical events. At the very minimum, monitor the following files:
- Start centralizing your logs as early as possible. Using syslog now instead of some custom logging framework will makes shipping the logs to a central logging server easier in the future.
Cryptography is a complex topic that should be covered in its own right. Even slight oversights or mistakes can lead to the complete compromise of the security of a product. This is why the "don't roll your own cryptography" mantra is so often repeated. Two good sources to read before you start working with cryptography are Crypto101 written by Laurens Van Houtven (lvh) and the matasano crypto challenges.
That being said, keeping your infrastructure safe requires some use of cryptography, and there are common patterns that can be safely used. This section covers one of those patterns: how to store sensitive data in source code (or on disk).
When your store credentials, either in source control or on disk, don't store them unencrypted. You might think that your passwords are secure if you use a GitHub private repository, but you don’t want to rely on GitHub to keep your entire infrastructure safe from attackers. If you encrypt your credentials, you can maintain your security even if GitHub is compromised.
When looking for a tool or library to encrypt small amounts of data, consider the following recommendations:
- Use a modern symmetric cipher. Two commonly suggested candidates are AES and Salsa20 (NaCl).
- If your symmetric cipher supports different modes, carefully select the mode. For example, CBC is a good mode to use with AES, but ECB is not.
- Use a Message Authentication Code (MAC) to ensure that the encrypted data has not been tampered with. HMAC-SHA-512 or Poly1305 are good candidates.
- Use a high-quality source of random, which typically means using
/dev/urandomto obtain random numbers used in keys, salts, and nonces.
- If the library or tool works with passphrases, ensure that it uses a KDF to transform the passphrase into a key.
You can build an encryption tool yourself, however as mentioned, this can be tricky and is not recommended. However, if you insist on building it, use a library like NaCl or cryptography.io that will at least get the cryptography right for you. However, it's even better to use a "recipe" someone has built to do this like lemma or Fernet which both expose a simple API you can use the encrypt and decrypt data in a safe manner.
- DON'T: Randomly pick the mode you are using your symmetric cipher in.
- DO: Use an authenticated symmetric cipher with a MAC.
- DO: Use
/dev/urandomto generate random material.
- DO: Use a "recipe" like lemma or Fernet if you can.
Although backups may not seem to be in the same category as the other topics discussed in this guide, they matter for infrastructure security just as much. Backups server two primary purposes: restoring in-case of some non-malicious hardware failure and restoring in the case of an attacker compromising your infrastructure. Remember, in the case of a compromise, it's better to wipe your server and create a fresh one than try and remove malware which can be difficult to impossible for a novice. This is why backups are critical if the case of a compromise in brining your infrastructure back up in a trusted state.
The following approaches are a good general strategy to follow when working with backups:
- Don't under secure your backups because they don't seem to be mission critical for your service. Attackers frequently target backup infrastructure for just this reason.
- Back up as frequently as is appropriate for your business needs. Once a day is reasonable.
- Your backup servers should have limited access, and the accounts that do exist should use different authentication and authorization mechanisms than those used for the rest of your infrastructure. For example, your backup servers should use a different SSH key for login. If you do these things and your main environment is compromised, you will still have trustworthy backups you can restore from.
- If you don't want to run your own backup infrastructure, backup to a third-party data store like Amazon S3. Do note that if you are going to use any third-party service, encrypt your backups before you send your data to them. Work with the assumption that your data store is a public data store and use encryption to protect your data. If you work with this mindset, even if your host is compromised, your data is safe.
- If you are using Amazon S3 or Rackspace CloudFiles, use an authenticated cipher recipe like lemma or or Fernet. If you don't want to think about the cryptography, use a service that encrypts your data on the client and only sends encrypted blobs to it's service like Tarsnap.
- Back up your source code repositories, any third-party software that your application uses, and your database. The recent example of FoundationDB illustrates the importance of backing up any software that you use. Software downloads can be revoked by the developer at any time for any reason.
- Although distributed version control systems (DVCSs) like Git provide some redundancy, they don’t replace actual backups. You don't want to rely on a particular branch being on a co-workers workstation to ensure business continuity.
- Back up your database using the method that the database prescribes.
- Restore from backups as often as you run backups themselves. Backups are of no use if they are not usable. Ideally you can run some auxiliary services that don't require the most recent data against your restored backups. That way if something goes wrong, you'll know immediately.
- Ensure that multiple people on your team are capable of restoring from backups. You might be able to figure out how to restore your backups, or maybe not. It might take an hour or ten hours. It's better to spend a few hours every quarter reviewing your backup infrastructure with a co-worker.
- DON'T: Use the same accounts for backups as your primary environment.
- DO: Backup source repositories, third party software, and databases.
- DO: Attempt to restore from backups as often as you run the backup procedure.
- DO: Run auxiliary services off of restored backups if you can.
- DON'T: Have a single point of failure, have multiple people on your team be capable of restoring from backups.
- DO: Encrypt your backups with an authenticated cipher. Use a tool like lemma or or Fernet, or a hosted solution like Tarsnap.