TLS 1.3 ECH – How to Preserve Critical Traffic Visibility for Enterprise and Network Security while Safeguarding Privacy
It is estimated that 95% of Web traffic is now encrypted (1) with the objective of safeguarding data privacy. This is overall a very important and positive evolution. However, it makes managing and securing networks extremely challenging because without visibility into the traffic, it’s impossible to prioritize and route the flows correctly or to identify malicious elements that could pose a threat to resources and users. And the latter is of very high concern as more than 85% of cyber threats are now delivered via encrypted channels (1).
It is no surprise, therefore, that encryption is one of the biggest challenges for solution vendors whose products require accurate, real-time, application awareness to function correctly. Zero Trust Network Access (ZTNA), Software-Defined Wide-Area Networking (SD-WAN), Secure Access Service Edge (SASE), Network Detection and Response (NDR/XDR) and firewalls are just a few of the solutions vital to network operations and security that rely on detailed traffic visibility.
However, with the right methods and technology, it has still been possible to identify and classify flows – and even to detect suspicious traffic – despite the rising use of encryption and evolving encryption techniques. Until recently, that is. The arrival of TLS 1.3 ECH challenges all known approaches to visibility into encrypted enterprise and network traffic, and many industry professionals are asking how they will be able to carry out essential network operations when TLS 1.3 ECH becomes the encryption standard.
I am happy to tell you that I am optimistic about this. But more on that later. First, let’s take a look at the different encryption uses and technologies, and current techniques for identifying and classifying traffic.
The Three Most Important Uses of Encryption Over the Internet Today
There are three main areas where encryption is used over the Internet today:
- VPNs (Virtual Private Networks): A VPN is a centrally managed software service that provides data privacy, and is increasingly used for anonymity over the Internet as well. VPNs encrypt all data coming from a device and route it to a destination network.
- TOR (The Onion Router): Tor is the de-facto anonymity solution on the Internet. It is a decentralized overlay network that runs on top of traditional internet infrastructure and uses multiple layers of encryption. It anonymizes web traffic, hiding all activities, and provides truly private web browsing networks (the dark web).
- TLS (Transport Layer Security): The number 1 encryption technology used on the internet. It is used by virtually all websites and browsers today, including Chrome, Firefox and Edge, to keep data in transit over the internet private, but provides limited anonymity depending on the version in use.
Encrypted Traffic Doesn’t Mean Safe Traffic
Given the deep integration of the Internet with contemporary enterprise networks, it’s important to understand that encrypted traffic doesn’t mean safe traffic, it just means that traffic is private. The opaque encrypted traffic can therefore potentially transport malware or exfiltrate critical information as security solutions go blind. Furthermore, not being able to identify the applications or types of traffic that are travelling across a network can really cripple network management, load balancing and traffic policy/shaping activities.
So, encryption is a huge challenge because visibility is key to properly securing and managing networks.
Don’t Worry! It’s Still Possible to Identify Encrypted Traffic
You will be pleased to learn that there are several methods that safeguard essential visibility into network traffic while preserving privacy. These include technologies such as:
- Deep Packet Inspection (DPI): As the name suggests, this technique is an advanced method for inspecting data packets right up to OSI layer 7 and beyond, meaning that it can identify not only the type of traffic but also the data payload, providing both detail and context. Of course, encryption means that full packet inspection is no longer possible without decryption. However, with the most widely used encryption protocols, there remains a clear part of the traffic that can be inspected. For example, in TLS, there is a clear part of the traffic called the client hello. This is used for the client-server exchange, where both the Client and the Server communicate by sharing meaningful “hello” information such as the Server Name Indication and in the TLS certificate the Common Names. These can be exploited by DPI technology to identify the applications.
- IP Addresses: Using hostname resolution, an assigned hostname is converted or resolved to its mapped IP address so that networked hosts can communicate with each other. These host resolutions can be linked back to known content providers such as Amazon, Apple or Google, delivering meaningful information on the network traffic.
- Statistical and Machine Learning-Based Profiling: Statistical and behavioral analysis using machine learning (ML) can be used to identify certain profiles and behavior which distinguish different traffic categories such as audio, video, and file transfer as well as fingerprinting applications such as Youtube, Whatsapp or Netflix.
- Decryption: It might seem obvious, but decryption is also a technique for identifying network traffic. The problem is that it requires a specific framework such as a TLS proxy to reveal the network packet payload in cleartext. While decryption is essential for certain cybersecurity functions for private networks, its complexity can outweigh the need to simply identify flows for certain tasks such as traffic routing or policy control. The difference, therefore, between decrypting and not decrypting traffic is the need to inspect packet payloads (i.e., content). Using one of the above techniques, we can still identify applications such as Facebook, Instagram or Gmail, for instance, which is useful for prioritizing and routing. But it won’t be possible to know what’s inside the content of the flow, such as an email attachment in Gmail, should there be a suspicion of malicious activity.
- A final option is to use ‘pre-encryption’ techniques that capture data at the endpoint or infrastructure/host device before the data to be transmitted is encrypted. While these techniques can provide full application visibility regardless of the encryption standard used, device access cannot be universally secured for all device types and network environments. Therefore, network-based strategies like DPI will always be needed to meet visibility requirements.
But What About Network-Based Traffic Visibility in TLS 1.3 ECH?
TLS 1.3 ECH (where ECH stands for Encrypted Client Hello) will, for the most part, blind current network-based traffic inspection techniques, unless decryption is used. It is thought that it will become the encryption standard sometime in 2024 – we’ll see! However, the impact is clear. Until now, the majority of traffic identification techniques have been able to recognize standard TLS 1.3 (RFC 8446) encrypted traffic by accessing information that remains clear in the client hello. With the upcoming TLS 1.3 ECH the hello exchanges are now going to be completely encrypted meaning that these techniques can no longer identify the web applications or the websites generating the traffic.
The only remaining network-based methods to identify TLS 1.3 ECH encrypted traffic would be to use machine learning techniques that take the packet heuristics and match them to different application-specific traffic profiles, or to decrypt.
How Difficult is it to Decrypt TLS 1.3 ECH?
It is, of course, possible to decrypt TLS 1.3 ECH but it’s a significant challenge for two main reasons.
Prior to TLS 1.3, encryption key pairs were static by default and often related to endpoints meaning a significant number of flows could be inspected by the decryption frameworks with the same key pairs hence leveraging performance and efficacy of such systems. With TLS 1.3 ECH, the key pairs used to encrypt the traffic are ephemeral, i.e. they are session-based and therefore temporary; they are discarded once the session is terminated, and new ones are created for the next session. This requires frameworks to scale up to inspect the millions of extra flows that are created. TLS 1.3 ECH also uses different key exchange techniques, such as ECDHE which, until recently, TLS proxies did not support.
So new decryption technologies are required with better scalability and performance. While we are seeing an increasing number of vendors proposing decryption solutions tailored for TLS 1.3, complexity remains a challenge for decryption of TLS 1.3 ECH, which, in my opinion, is achievable for in-band decryption, but is much more complicated for out-of-band decryption.
Is Machine Learning the Answer?
For the last couple of years, at Enea we have been working to anticipate the rise of DPI-blinding encryption technologies. We initially worked on TLS 1.3 eSNI (encrypted Server Name Indication), as it was called back then. In the end, TLS 1.3 was standardized without eSNI, keeping a clear part of the traffic for hello exchanges, similar to its predecessors. This was a result of external pressure from industry and government officials seeking to preserve minimal clear data for specific public safety, enterprise security, and network management needs.
Nevertheless, we worked on solutions designed for fully DPI-blind encryption. Specifically, we developed an ML-based approach to identifying traffic where dozens of packet features were combined and analyzed to build profiles for different traffic types. The results were used to create a machine learning-based library called ML ETC (Machine Learning based Encrypted Traffic Categorization) which can be used to identify global categories of traffic such as video streaming, audio calls or chat.
That was 2 years ago. Today, our objective is to capture the details that will allow us to identify the applications and services with a high confidence rate. To achieve this, we are combining three different information collection methods: first, our DPI engine provides the ground truth, then a static IP database containing millions of continuously verified IP addresses identifies the content providers, and finally ML algorithms specifically tailored for network traffic combining several dozens of packet features create a model specifically trained to match a designated application’s traffic. The initial PoCs have produced results exceeding our expectations making us more than confident that we will be able to handle the rise of TLS 1.3 ECH and continue to provide advanced network-based traffic identification and classification that, together with endpoint- and host-based techniques, collectively preserve visibility within diverse environments for essential networking and security needs.