jeudi 17 novembre 2016

CDN benchmarking

Today, when we want to compare the performance of different CDN providers in a specific region, the first reflex is to check public Real User Monitoring (RUM) data, with Cedexis being one of the most known RUM provider. RUM data is very useful, and many CDN providers buy it in order to benchmark with other competitors and continuously work on improving performance.

I will highlight in the following what exactly Community RUMs measure, so you do not jump quickly to some wrong conclusions. Let's focus on the latency performance KPI and list the different components that contribute to it:
  • Last mile latency to CDN Edge, which reflects how near is it to the user from network perspective.
  • Cache width latency, which is mainly due to CDN Edge not having the content locally and must go get it from somewhere (Peer Edge, Parent Caching or simply from the origin)
  • Connectivity latency from CDN to Origin when there is a cache fill needed.

In general, Community RUM measurements are based on calculating the time (RTD) it takes to serve users a predefined object from CDN Edges. Since the object is the same and doesn't change, it's always cached on edges. In consequence, Community RUM solely measure first mile network latency, which reflects sufficiently the latency performance of very high popular objects in cache.

Nevertheless that's only a part of the picture. In real life, CDNs have different capabilities and strategies for storing content beyond Edges and filling it from origin:
  • According to content popularity, CDN cache purge policy, disk space available (Cache Width) on the Edge and Parent Caching architecture, the request will be a cache miss or hit with impact on performance. VoD provider with large video library know very well this topic. 
  • According to CDN upstream connectivity, the number of hops needed to fill from origin impacts connectivity latency. CDNs who built their own backbone benefit from a good upstream connectivity. Dynamic content is very sensitive to this aspect.
As a final word, we also need to be aware that CDNs tend to optimize their configuration used by RUM measurement for this specific use case.

lundi 14 novembre 2016

Sales Engineer, modus operandi

Recently, a friend asked me : "What qualities would you look for if you had to recruit someone in the same position as you, i.e. Sales Engineer?", and of course, to make it easier, he asked me how to evaluate these qualities. In this post I will try to answer the question, which is a very interesting one, because it pushed me to stand back a bit and think about my role with detachment.

Sales Engineer (SE) role can be quite different from a company to another and with different titles: Solutions Engineer, Presales Engineer, Solutions Architect, Consultant... In fact, SE can be more or less specialized/generic, more or less involved in delivery, in pricing, in bid management...  In a nutshell, SE, part of the sales team, is a professional who provides technical advice and support in order to meet customer business needs. It seems nowadays that companies are having difficulty finding such profile who combines business acumen and extensive technical knowledge.

The first word in SE is "Sales", but let me start by "Engineer", because technical knowledge is the solid foundation on which trust is built with customers. Indeed, SE must be ready to dive in a technical subject as deep as required by business, understand customer problems and solve them. Nonetheless, static knowledge is not sufficient in a fast-paced and changing technological landscape: This is where the curiosity of SE and his passion for learning are vital for his "survival".
Let's take the case of SE specialized in CDN. He knows very well how internet works, TCP/IP stack, DNS system and HTTP protocol.  He can explain on high level how caching works, but also can dig into HTTP RFC 2616 if needed to answer a specific question about caching. He is following latest trends in his industry such as H2, TLS, SDN, security and looking closely at what is being done at the competition.

So, back to the "Sales" in SE. First, the company counts on the alignment of SE to sales targets, as well as on his strategic thinking in order to create competitive advantage for its products. Second, SE is regularly doing presentations or demos to customers, thus he needs to have good communication skills, to excel at storytelling and to be attentive to his public in order to adapt in real time. Finally, I would say that SE must handle stress and pressure due to sales dynamics. A typical assessment of this skill set would be asking the SE to make a presentation and challenge him during it.

Now the best of SE is in the synergy between "Sales" and "Engineer". Being able to deliver a technical pitch with a variable depth, SE ties relations and build trust at different levels within customer organization. For exemple, our CDN SE brings value to customer's CTO by explaining how its product can help him to increase revenues by improving the online buyer experience, or cope with Christmas load on his infrastructure, and in the same time spend time advising customer's website admin on caching best practices for an optimal CDN setup. By discussing with customer, and asking the right questions, he is able to translate business requirements into a technical solution.

In addition, some other skills are very nice to have in this role, such as coaching partners, training sales, managing some projects... 

I am lucky I have the opportunity to be in an SE position since more than 5 years now, which is a position that sits in a special place within the organization, at the crossroad of sales, engineering, business development, product management.... This role has changed me a lot and I already feel the opportunities that are opening to me.

mercredi 2 novembre 2016

How to evaluate a DDoS mitigation solution?

Let me start by this funny story. Marie, a 16 years old school student, was our guest in the office for a week to discover the professional world. We explained to her about our business, networks, internet... but when we started talking about IT security threats, we were hilariously surprised: She confessed about having already launched a DDoS attack on the school website, so her parents can not access her results on that day where grades were available online!!!! 

With almost no entry barriers for launching DDoS attacks nowadays, the industry is witnessing considerable growth in the number and size of attacks. Unprotected connected objects have even driven this growth exponentially with IoT infected botnets being massively used as attack vector. In the last 30 days, KrebsOnSecurity  got 600 Gbps attack, OVH over 1 Tbps attack. The last attack was on Dyn DNS provider whose failure impacted major internet services such as Netflix. Mirai Malware was used to launch this attack, by scanning and infecting more than 500 000 connected cameras, DVRs...

To protect themselves, companies are dedicating a larger percentage of their budgets to security and thus creating a very attractive emerging business for providers from different horizons. We can list vendors who started providing services based on their technologies (Abor, Radware), network operators (Level3, Tata), CDN providers (Limelight, Akamai), Security providers (Incapsula, Cloulfare) and cloud providers (Azure, Rackspace).

Each positioning and implementation has its strengths and weaknesses. In the following I'll share with you some key technical elements to take into consideration when you are evaluating a DDoS mitigation solution. 

I'll start first by a quick description of DDoS attack layers. Indeed, attacks target ressources at different layers, each one of them is critical for service continuity. Volumetric attacks either try to flood internet bandwidth mostly with reflection mechanisms (DNS/NTPCHARGEN..) or overwhelm frontal network equipment, for example by exhausting router CPU with packet fragmentation. In the upper layers, an attacker can target the middleware such as HTTP server by brute force GET requests and slow session techniques, or the application layer directly by well crafted HTTP requests that exhausts the application logic or its database. 

Providers should be able to protect from DDoS attacks on different layers:
  • Protection from volumetric attacks requires a considerable infrastructure capacity in terms of network and scrubbing centers.
    Indeed, scrubbing centers count and geographical distribution is critical to absorption capacity and robustness, as they would mitigate an attack the closest to it's source before forming an avalanche, which is much riskier to handle afterwards. It is the case of a provider lacking presence in some regions like APAC, or having only one scrubbing center in a specific region. With the same concerns, scrubbing centers should be connected to internet through extensive peerings and network capacity.
    On this ground, tiers 1 operators have the best position to deal with the larger attacks (e.g. 1Tbps) thanks to their scale. For example, Level 3 has implemented BGP flowspec on its backbone, thus leveraging its edge capacity (42Tbps) to block some volumetric attacks before even they get into scrubbing centers.
  • Protection from upper layers DDoS attacks is more about the intelligence in the scrubbing centers.  Some providers use proprietary technologies like Radware, Arbor or Fortinet, some mix them for better security, and some simply do not use any to avoid licencing fees and thus be more competitive price-wise. What is the underlying technology capable of? signature based only or enabled with behavioral analysis? Manages SSL traffic? false-positive ratio? Compatible with hybrid (cloud+on premise) implementation?
  • In all cases, mitigation should be powered by threat intelligence capabilities. For example, a botnet can be identified before any attack by its communication profile with C&C servers, and the associated infected IPs are fed to the mitigation technology.

One last thing I want to mention is performance. It's not enough that providers stop a DDoS attack, they shoud guarantee that normal traffic won't suffer from performance issues. Let me illustrate by some examples:
  • A large percentage of your traffic is very local in a region (let's say Middle East) where a provider does not have a scrubbing center. That means your traffic will go to Europe to be scrubbed, and then back to the Middle East, thus adding considerable latency. 
  • Your provider has only one scrubbing center in Europe, and gets critically impacted by attacks on several banking customers of his. In this case, your traffic will be rerouted to the nearest scrubbing center, for example in North Amercia, thus adding considerable latency.
  • Routed mitigation solutions uses BGP to divert your traffic on a /24 subnet from your AS to the AS of your provider who will clean it and send it back to you. First thing to consider is the BGP convergence time because it impacts the global time to mitigate. Convergence time decreases when your provider is very well connected to internet, and can even be instantanous if you are using the same provider for internet connectivity. Second thing to consider is the impact of rerouting all of your /24 subnet when only one host is targeted. Does your provider give you the possibility to reroute only the attacked IP?
  • Your provider is using a scrubbing technology that requires intensive tweaking and humain intervention per attack mitigation. In this case, you can expect a longer time to mitigate.