Sound Familiar?

It’s not a sentiment you would expect from most IT decision-makers. However, it’s something we hear from an increasing number of organizations.

The benefits of a well-thought-out cloud transformation roadmap are not lost on them.

  • They know that, in an ideal world, they ought to start with an in-depth assessment of their application portfolio, in line with the best practice – “migrate your capabilities, not apps or VMs”.
  • They also realize the need to develop a robust cloud governance model upfront.
  • And ultimately, they understand the need to undertake an iterative migration process that takes into account “organizational change management” best practices.

At the same time, these decision-makers face real challenges with their existing IT infrastructure that simply cannot wait months and years for a successful cloud transformation to take shape. They can’t get out of their on-premises data centers soon enough. This notion isn’t limited to organizations with fast-approaching Data Center (DC) lease renewal deadlines or end of support products, either.

So, how do we balance the two competing objectives:

  • Immediate need to move out of the DC
  • Carefully crafted long-term cloud transformation

A Two-Step Approach to Your Cloud Transformation Journey

From our experience with a broad range of current situations, goals, and challenges, we recommend a two-step cloud transformation approach that addresses both your immediate challenges and the organization’s long-term vision for cloud transformation.

  1. Tactical “Lift-n-Shift” to the Cloud – As the name suggests, move the current DC footprint as is (VMs, databases, storage network. etc.) to Azure
  2. Strategic Cloud Transformation – Once operational in the cloud, incrementally and opportunistically move parts of your application portfolio to higher-order Azure PaaS/cloud-native services

Tactical “Lift-n-Shift” to the Cloud

Lift n Shift Approach to Cloud Transformation

On the surface, step #1 above may appear wasteful. After all, we are duplicating your current footprint in Azure. But keep in mind that step #1 is designed for completion in days or weeks, not months or years. As a result, the duplication is minimized. At the same time, step #1 immediately puts you in a position to leverage Azure capabilities, giving you tangible benefits with minimal to no changes to your existing footprint.

Here are a few examples of benefits:

  • Improve the security posture – Once you are in Azure, you tap into security capabilities such as intrusion detection and denial of service attack solely by being in Azure. Notice that I deliberately did not cite Security Information and Event Management (SIEM) tools like Azure Sentinel. Technically you can take advantage of Azure Sentinel for on-premises workloads.
  • Replace aging hardware – Your hardware may be getting old but isn’t old enough for a Capex-powered refresh. Moving your VMs to Azure decouples you from the underlying hardware. “But won’t that be expensive, since you are now paying by usage per minute?” you ask. Not necessarily and certainly not in the long run. Consider options like Reserved Instance (RI) pricing that can offer an up to 80% discount based on a one- or three-year commitment.

Furthermore, you can combine RI with Azure Hybrid Benefits (AHUB) which provides discounts for licenses already owned. Finally, don’t forget to take into account the savings from decreased needs for power, networks, real estate, and the cost of resources to manage all the on-premises assets. Even if you can’t get out of the DC lease completely, you may be able to negotiate a modular reduction of your DC footprint. Please refer to Gartner research that suggests that over time, the cloud can become cost-effective.

AMP Move out of Data Center

Source – https://blogs.gartner.com/marco-meinardi/2018/11/30/public-cloud-cheaper-than-running-your-data-center/

  • Disaster Recovery (DR) – Few organizations have a DR plan setup that is conducive for ongoing DR tests. Having an effective DR plan is one of the most critical responsibilities of IT. Once again, since geo-replication is innate to Azure, your disks are replicated to an Azure region that is at least 400 miles away, by default. Given this, DR is almost out-of-the-box.
  • Extended lease of life on out of support software – If you are running an Operating System (OS), such as Windows Server 2008 or SQL Server 2008, moving to Azure extends the security updates for up to three years from the “end of support” date.
  • Getting out of the business of “baby-sitting” database servers – Azure managed instances offer you the ability to take your existing on-premises SQL Server databases and move them to Azure with minimal downtime. Once your database is an Azure SQL Managed Instance, you don’t have to worry about patching and backup, thereby significantly reducing the cost of ownership.
  • Take baby steps towards automation and self-service – Self-service is one of the key focus areas for most IT organizations. Once again, since every aspect of Azure is API driven, organizations can take baby steps towards automated provisioning.
  • Get closer to a data lake – I am sure you have heard the quote “AI is the new electricity”. We also know that Artificial Intelligence (AI) needs lots and lots of data to train the Machine Learning (ML) algorithms. By moving to Azure, it is that much easier to capture the “data exhaust” coming out the applications in a service like Azure Data Lake. In turn, Azure Data Lake can help turn this data into intelligence.

Strategic Cloud Transformation

Strategic Cloud Transformation

Once you have completed step #1 by moving your on-premises assets to the cloud, you are now in a position to undertake continuous modernization efforts aligned to your business priorities.

Common approaches include:

  • Revise – Capture application and application tiers “as-is” in containers and run on a managed orchestrator like Azure Kubernetes Service. This approach requires minimal changes to the existing codebase. For more details of this approach, including a demo, read Migrate and Modernize with Kubernetes on Azure Government.
  • Refactor – Modernize by re-architecting to target Platform as a Service (PaaS) and “serverless” technologies. This approach requires more significant recoding to target PaaS services but allows you to take advantage of cloud provider managed services. For more information, check out our “Full PaaS” Approach to Modernizing Legacy Apps.
  • Rebuild – Complete rewrite of the applications using cloud-native technologies like Kubernetes, Envoy, and Istio. Read our blog, What Are Cloud-Native Technologies & How Are They Different from Traditional PaaS Offerings, for more information.
  • Replace – Substitute an existing application, in its entirety, with Software as a Service (SaaS) or an equivalent application developed using a no-code/low-code platform.

The following table summarizes the various approaches for modernization in terms of factors such as code changes, operational costs, and DevOps maturity.

Compare App Modernization Approaches

Azure Migration Program (AMP)

Microsoft squarely aligns with this two-step approach. At the recent Microsoft partner conference #MSInspire, Julia White announced AMP (Azure Migration Program).

AMP brings together the following:

Wrapping Up

A two-step migration offers a programmatic approach to unlock the potential of the cloud quickly. You’ll experience immediate gains from a tactical move to the cloud and long-term benefits from a strategic cloud transformation that follows. Microsoft programs like AMP, combined with over 200+ Azure services, make this approach viable. If you’re interested in learning more about how you can get started with AMP, and which migration approach makes the most sense for your business goals, reach out to AIS today.

Image of book coverI recently read Patrick Lencioni’s latest book, Getting Naked: A Business Fable About Shedding The Three Fears That Sabotage Client Loyalty. I found the book useful, well written, and insightful.

Here is a quick summary of the key ideas:

Always Consult instead of Sell

Rather than harping on past accolades and what you can do if the organization hires you, transform every sales situation into an opportunity to demonstrate value. Start adding value from the very first meeting… without waiting to be hired. Don’t worry about a potential client taking advantage of your generosity. (One potential client in 10 may – but you don’t want that one potential client as a customer anyway!)

Tell kind truth

Be ready to confront a client with a problematic message, even if the client might not like hearing it. Don’t sugar coat or be obsequious. Rather, replay the message in a manner that recognizes the dignity and humanity of the client. Be prepared to deal with “the elephant in the room” that everyone else (including your competitors) is afraid to address.

Don’t be afraid to ask questions or suggest ideas, even if they seem obvious

There is a good chance that a seemingly obvious question/suggestion is actually of benefit to many in the audience.

Of course, there is also that chance of posing a “dumb” question. But no one is expecting perfection from their consultants, but they do expect transparency and honesty. There is no better way to demonstrate both than by acknowledging mistakes.

Make everything about the client

Be prepared to humble yourself by sacrificially taking the burden off of a client in a difficult situation. Be ready to take on whatever the client needs you to do within the context of the services you offer.

The focus of all of your attention needs to be on understanding, supporting, and honoring the business of the client. Do not try to shift attention to yourself, your skills, or your knowledge. Honor the client’s work by taking an active interest in their business and appreciating the importance of their work.

Admit your weaknesses and limitations

Be true to your strengths and be open to admitting your weaknesses. Not doing so will wear you out and prevent you from doing your best in your areas of strength.

Like many of you, we at AIS attempt to subsume many of the above ideas from Lencioni’s book. For example, rather than “sell” a project, we often co-invest (along with the customer) in a micro-POC* or a one-day architecture and design session that helps lay out the vision for the solution, identify main risks, and crystallize the requirements for a project. With the insights derived from a micro-POC, clients can make an informed decision on whether to move forward or not. Here are a couple of examples of micro-POCs:  Jupyter Notebooks as a Custom Calculation Engine, and Just-in-Time Permission Control with Azure RBAC

* Micro-POC (micro Proof of concept) – a time-boxed (typically ~40 hours) “working” realization of an idea/concept with the aim to better understand the potential, risks, and effort involved. This is *not* an early version of a production application.  Also, see this Wikipedia definition of Proof of concept

As we think about services that Azure can offer, we often think about apps (e.g., App Services, AKS, and Service Fabric) and data (e.g., Azure Storage, Data Bricks, and Azure Data Lake). It turns out that you can also leverage Azure purely as a Network-as-a-Service (NaaS) –Network is a basic building block for all the app and data services in Azure. But, by NaaS, I am explicitly talking about leveraging Azure networking in a *standalone* manner. Allow me to explain what I mean:  In the picture below, you will see a representation of the Azure global footprint. It is so vast that it includes 100K+ miles of fiber and subsea cables, and 130 edge locations connecting over 50 regions worldwide. Think of NaaS as a way to tap into Azure’s global infrastructure to improve network performance and resilience of your applications, regardless of whether the apps are hosted in Azure or not.

Azure's Global Footprint

Let’s discuss two specific Azure Services that offer NaaS capabilities. Note, there are other services like Azure Firewall – think firewall as a service – that can fall in the NaaS category. However, I am limiting my discussion to two services – Azure Front Door and Azure Virtual WAN. In my opinion, these services closely align with a focus on leveraging Azure network infrastructure and services in a standalone manner.

Azure Front Door Service Icon

Azure Front Door Service

Azure Front Door service allows you to define global routing for your applications that optimize performance and resilience. Front Door is a layer 7 (HTTP/HTTPS) service. Please refer to the diagram below for a high-level view of how Front Door works – you can advertise your application’s URL using the anycast protocol. This way, traffic directed towards your application will get picked up by the “closest” Azure Front Door and routed to your application hosted in Azure on-premises – for applications hosted outside of Azure, the traffic will traverse the Azure network to the point of exit closest to the location of the app.

The primary benefit of using Azure Front Door is to improve the network performance by routing over the Azure backbone (instead of the long-haul public internet). It turns out there are several secondary benefits to highlight: You can increase the reliability of your application by having Front Door provide instant failover to a backup location. Azure Front Door uses smart health probes to check for the health of your application. Additionally, Front Door offers SSL termination and certificate management, application security via Web Application Firewall, and URL based routing.

Routing Azure Front Door

Azure Virtual WAN

Azure Virtual WAN

Azure Virtual WAN offers branch connectivity to, and through, Azure. In essence, think of Azure regions as hubs, that along with the Azure network backbone can help you establish branch-to-VNet and branch-to-branch connectivity.

You are probably wondering how Virtual WAN relates to existing cloud connectivity options like point-to-site, site-to-site, and express route. Azure WAN brings together the above connectivity options into a single operational interface.

Azure Virtual Wan Branch Connectivity

The following diagram illustrates a client’s virtual network overlay over the Azure backbone. The Azure WAN virtual hub is located in the Western Europe region. The virtual hub is a managed virtual network, and in turn, enables connectivity to VNets in Western Europe (VNetA and VNetB) and an on-premises branch office (testsite1) connected via site-to-site VPN tunnel over IPSec. An important thing to note is that the site-to-site connection is hooked to the virtual hub and *not* directly to the VNet (as is the case with a virtual network gateway).

Virtual Network Overlay Azure Backbone

Finally, you can work with one of many Azure WAN partners to automate the site-to-site connection including setting up the branch device (VPN/SD-WAN – software defined wide area network) that automates the connectivity setup with Azure.

I just returned from Microsoft BUILD 2019 where I presented a session on Azure Kubernetes Services (AKS) and Cosmos. Thanks to everyone who attended. We had excellent attendance – the room was full! I like to think that the audience was there for the speaker 😊 but I’m sure the audience interest is a clear reflection of how popular AKS and Cosmos DB are becoming.

For those looking for a 2-minute overview, here it is:

In a nutshell, the focus was to discuss the combining Cloud-Native Service (like AKS) and a Managed Database

Microsoft Build Session Architecting Cloud-Native Apps with AKS and Cosmos DB Slide Deck

We started with a discussion of Cloud-Native Apps, along with a quick introduction to AKS and Cosmos. We quickly transitioned into stateful app considerations and talked about new stateful capabilities in Kubernetes including PV, PVC, Stateful Sets, CSI, and Operators. While these capabilities represent significant progress, they don’t match up with external services like Cosmos DB.

Microsoft Build Session Architecting Cloud-Native Apps with AKS and Cosmos DB Slide Deck

Microsoft Build Session Architecting Cloud-Native Apps with AKS and Cosmos DB Slide Deck Cloud Native Tooling

One option is to use Open Service Broker – It allows Kubernetes hosted services to talk to external services using cloud-native tooling like svcat (Service Catalog).

Microsoft Build Session Architecting Cloud-Native Apps with AKS and Cosmos DB Slide Deck svcat

Microsoft Build Session Architecting Cloud-Native Apps with AKS and Cosmos DB Slide Deck SRE

External services like Cosmos DB can go beyond cluster SRE and offer “turn-key” SRE in essence – Specifically, geo-replication, API-based scaling, and even multi-master writes (eliminating the need to failover).

Microsoft Build Session Architecting Cloud-Native Apps with AKS and Cosmos DB Slide Deck Mutli Master Support

Microsoft Build Session Architecting Cloud-Native Apps with AKS and Cosmos DB Slide Deck Configure Regions

Microsoft Build Session Architecting Cloud-Native Apps with AKS and Cosmos DB Slide Deck Portability

Since the Open Service Broker is an open specification, your app remains mostly portable even when you move to one cloud provider to another. OpenService Broker does not deal with syntactic differences, say connection string prefix difference between cloud providers.  One way to handle these differences is to use Helm.

Learn more about my BUILD session:

Here you can find the complete recording of the session and slide deck: https://mybuild.techcommunity.microsoft.com/sessions/77138?source=sessions#top-anchor

Additionally, you can find the code for the sample I used here: https://github.com/vlele/build2019 

WORK WITH THE BRIGHTEST LEADERS IN SOFTWARE DEVELOPMENT

Let’s say you are trying to move an Oracle database to Azure, but don’t want to go down the route of creating an Oracle Database in an Azure VM for obvious reasons: You don’t want to be responsible for maintaining the VM’s availability, hotfixes patching, etc. At the same time, let’s say you do want to take advantage of a fully-managed persistence service that offers local and geo-redundancy and the ability to create snapshots to protect against accidental deletes or data corruption.

It turns out that the latest advancements in Azure Container Instances (ACI), combined with the ability to deploy them in a VNET, can get you close.

Get Your Oracle Resources into the Cloud, and Fast! Request an Oracle Migration Assessment from AIS Today.

Let’s start by reviewing the architecture.

Architecture diagram

We can host an Oracle DB container image inside Azure Container Instances (ACI). ACI is a container-as-a-service offering that removes the need to manage the underlying virtual machines. It also eliminates the need for setting up our orchestrator. Additionally, ACI-hosted containers (Linux only for now) are placed in a delegated subnet. This allows the Azure container instance to be available from inside a VNET without the need to open a public endpoint.

Finally, the data files (persistent) aspect of the database resides in Azure Files, which removes the need to manage our durable storage since Azure Files takes care of the local and geo-redundancy. Additionally, Azure Files can take snapshots, allowing us a point-in-time restore ability.

(Azure Files also support Virtual Network service endpoints that allow for locking down access to the resources within the VNET.)

ACI also offers fast start times, plus policy-based automatic restarting of the container upon failure.

Here are the three steps to get this setup working:

Step One

Create the ACI hosting Oracle Database Server 12.2.01 that mounts an Azure File share and is connected to a delegated subnet.

az container create -g <> --name <> --image registry-1.docker.io/store/oracle/database-enterprise:12.2.0.1 --registry-username <> --registry-password <> --ports 1521 5500 --memory 8 --cpu 2 --azure-file-volume-account-name <> --azure-file-volume-account-key <> --azure-file-volume-share-name <> --azure-file-volume-mount-path /ORCL --vnet <> --vnet-address-prefix <> --subnet <> --subnet-address-prefix <>

Step Two

This step is a workaround that could be eliminated if we had access to the Docker file used to create this image.  We are essentially copying /oradata files containing the control files, data files, etc. to the Azure file share.

mkdir -p /u02/app/oracle/oradata/ORCL; cp -r /u02/app/oracle/oradata/ORCLCDB/. /u02/app/oracle/oradata/ORCL/

Step Three

Connect to the Oracle database from the VNET.

Since the Oracle DB container is created in a VNET, a private IP address is assigned to the container.  We can use this IP to connect to it from inside the VNET.

That’s it! We now have an Oracle database without the need to maintain the underlying VM or data volume.

Let’s Talk Pricing

Azure Container Instances bill per second at the “container” group level. “Container group” resources like vCPU/ Memory are shared across multiple containers sharing the same host

The current pricing per second is listed below:

Container group duration:
Memory: $0.0000015 per GB-s
vCPU: $0.0000135 per vCPU-s

The setup we defined above (8GB memory and two vCPUs) will cost ~$100/month based on the following pricing calculation:

Memory duration:

Number of container groups * memory duration (seconds) * GB * price per GB-s * number of days

1 container group * 86,400 seconds * 8 GB * $0.0000015 per GB-s * 30 days = $31.1

vCPU duration:

Number of container groups * vCPU duration (seconds) * vCPU(s) * price per vCPU-s * number of days

1 container groups * 86,400 seconds * 1 vCPU * $0.0000135 per vCPU-s * 30 days = $69.98

Total billing:

Memory duration (seconds) + vCPU duration (seconds) = total cost

$31.1 + $69.98 = $101 per month

Almost! But not completely there!

As the title suggests, the approach mentioned above gets us close to our objective of “Oracle DB as a service on Azure” but we are not all the way there. I would be remiss not to mention some of the challenges that remain.

Our setup is resilient to failure (e.g., policy-based restart) but this setup does not offer us high availability. For that, you will have to rely on setting up something like the Oracle Data Guard on Azure.

ACI supports horizontal scaling and as such the vertical scaling options are limited to current ACI limits (16GB and four vCPU).

ACI VNET Integration capability has some networking limits around outbound NSGs and public peering that you need to aware of.

I’d like to thank Manish Agarwal and his team for help with this setup.

Looking to Retire Your Oracle Hardware and Migrate to the Cloud? Contact AIS Today to Discuss Your Options.

kubernetes logoToday, let’s talk about network isolation and traffic policy within the context of Kubernetes.

Network Policy Specification

Kubernetes’ first-class notion of networking policy allows a customer to determine which pods are allowed to talk to other pods. While these policies are part of Kubernetes’ specification, tools like Calico and Cilium implement these network policies.

Here is a simple example of a network policy:

...
  ingress:
  - from:
    - podSelector:
        matchLabels:
          zone: trusted
  ...

In the above example, only pods with the label zone: trusted are allowed to make an incoming visit to the pod.

egress:
  - action: deny
    destination:
      notSelector: ns == 'gateway’

The above example deals with outgoing traffic. This network policy will ensure that traffic going out is blocked unless the destination is a node with the label ‘gateway’.

As you can see, network policies are important for isolating pods from each other in order to avoid leaking information between applications. However, if you are dealing with data that requires higher trust levels, you may want to consider isolating the applications at the cluster level. The following diagrams depict both logical (network policy based) and physical (isolated) clusters.

Diagram of a Prod Cluster Diagrams of Prod Team Clusters

Network Policy is NOT Traffic Routing…Enter Istio!

Network policies, however, do not allow us to control the flow of traffic on a granular level. For example, let’s assume that we have three versions of a “reviews” service (a service that returns user reviews for a given product). If we want the ability to route the traffic to any of these three versions dynamically, we will need to rely on something else. In this case, let’s use the traffic routing provided by Istio.

Istio is a tool that manages the traffic flow across services using two primary components:

  1. An Envoy proxy (more on Envoy later in the post) distributes traffic based on a set of rules.
  2. The Pilot manages and configures the traffic rules that let you specify how traffic should be routed.

Diagram of Istio Traffic Management

image source

Here is an example of Istio policy that directs all traffic to the V1 version of the “reviews” service:

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: reviews
spec:
  hosts:
  - reviews
  http:
  - route:
    - destination:
        host: reviews
        subset: v1

Here is a Kiali Console view of all “live” traffic being sent to the V1 version of the “reviews” service:

Kiali console screenshot

Now here’s an example of Istio policy that directs all traffic to the V3 version of the “reviews” service:

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: reviews
spec:
  hosts:
    - reviews
  http:
  - route:
    - destination:
        host: reviews
        subset: v3

And here is a Kiali Console view of all “live” traffic being sent to the V3 version of the “reviews” service:

Kiali console screenshot v3

Envoy Proxy

Envoy is a lightweight proxy with powerful routing constructs. In the example above, the Envoy proxy is placed as a “sidecar” to our services (product page and reviews) and allows it to handle outbound traffic. Envoy could dynamically route all outbound calls from a product page to the appropriate version of the “reviews” service.

We already know that Istio makes it simple for us to configure the traffic routing policies in one place (via the Pilot). But Istio also makes it simple to inject the Envoy proxy as a sidecar. The following Kubectl command labels the namespace for automatic sidecar injection:

#--> Enable Side Car Injection
kubectl label namespace bookinfo istio-injection=enabled

As you can see each pod has two containers ( service and the Envoy proxy):

# Get all pods 
kubectl get pods --namespace=bookinfo

I hope this blog post helps you think about traffic routing between Kubernetes pods using Istio and Envoy. In future blog posts, we’ll explore the other facets of a “service mesh” – a common substrate for managing a large number of services, with traffic routing being just one facet of a service mesh.

As organizations increase their footprint the cloud, there’s increased scrutiny on mounting cloud consumption costs, reigniting a discussion about longer-term costs.

This is not an entirely unexpected development. Here’s why:

  1. Cost savings were not meant to be the primary motivation for moving to the cloud – At least not in the manner most organizations are moving to the cloud – which is to move their existing applications with little to no changes to the cloud. For most organizations, the primary motivation is the “speed to value,” aka the ability to offer business value at greater speeds by becoming more efficient in provisioning, automation, monitoring, the resilience of IT assets, etc.
  2. Often the cost comparisons between cloud and on-premises are not a true apples-to-apples comparison – For example, were all on-premises support staff salaries, depreciation, data center cost per square foot, rack space, power and networking costs considered? What about troubleshooting and cost of securing these assets?
  3. As these organizations achieve higher cloud operations maturity, they can realize increased cloud cost efficiency – For instance, by implementing effective auto-scaling, optimizing execution contexts by moving to dynamic consumption plans like serverless, take advantage of discounts through longer-term contracts, etc.

Claim Your Free Whitepaper

In this whitepaper, we talk about the aforementioned considerations, as well as cost optimization techniques (including resource-based, usage-based and pricing-based cost optimization).

FREE WHITEPAPER ON AZURE COST MANAGEMENT: BACKGROUND, TOOLS, AND APPROACHES

About the Podcast

During KubeCon 2018, I had the pleasure to once again be a guest on the .NET Rocks! podcast. I talked to Carl and Richard about what it means to be cloud-native, the on-going evolution, and what that all means for 2019. We talked in depth about how the cloud-native approach impacts how we build applications on the cloud. We also talked about how the Cloud Native Computing Foundation (CNCF) is fostering an ecosystem of projects like Kubernetes, Envoy and Prometheus. Finally, we talked about cloud-native computing in the context of Microsoft Azure.

Listen to the full podcast here!

Related Content

If you’re curious about what it means to be cloud-native, you may also enjoy our previous blog post, What Are Cloud-Native Technologies & How Are They Different From Traditional PaaS Offerings. In this post, we discussed the key benefits of cloud-native architecture, compared it to a traditional PaaS offering, and laid out a few use cases.

On December 15th, I had the pleasure of presenting a session of “Introduction to Deep Learning” at the recently held #globalAIBootcamp (an amazing event with 68 participating locations worldwide).

This blog post captures some of the key points from my presentation. Feel free to go directly to the slides located here.

Deep Learning with Python book coverBefore I begin, I would like to thank François Chollet for his excellent book “Deep Learning with Python.” In my almost four-year quest to better understand deep learning, I found his book to be one of the best written books on this topic.

(I will totally understand if, at this point, you abandon reading this blog post and start reading François Chollet’s book…in fact, I recommend you do just that 😊)

The Importance of Deep Learning for Developers

Deep learning is such an incredible tool for developers like us. Deep learning has already helped solve complex problems such as near-human level image classification, handwriting recognition, speech recognition and more. Much of this aforementioned functionality is now available to us in the form of an easily callable API (e.g. Cognitive Services). Furthermore, many of these APIs also allow us to train the underlying models using data from our own specific domains (e.g. Custom Speech API). Finally, with the advent of automated ML tools, there is now a WYSIWYG way to get started with deep learning (e.g. Lobe.AI).

If we have all these tools available, why would we, as developers, need to code deep learning models from scratch?

To answer this question, let us review a recent quote from Andrew Ng.

“AI (Artificial Intelligence) technology is now poised to transform every industry, just as electricity did 100 years ago. Between now and 2030, it will create an estimated $13 trillion of GDP growth. While it has already created tremendous value in leading technology companies such as Google, Baidu, Microsoft and Facebook, much of the additional waves of value creation will go beyond the software sector.”

In addition to the astounding number in itself, the important part of this quote is the bold text (emphasis is mine). Large software companies have already created tremendous value for themselves through speech recognition, object classification, language translation, etc., and now it is up to us as developers to take this technology to our customers beyond the software sector. Bringing deep learning solutions to these sectors may, in many cases (but not all), require developing our own custom deep learning models. Here are a few examples:

  1. Predicting patient health outcomes based on past health data
  2. Mapping a set of attributes about a product to predict manufacturing defects
  3. Mapping pictures of food items to prices and calorie count for a self-checkout kiosk
  4. Smart bots for a specific industry
  5. Summarization of a vast amount of text into an auto-generated short summary with a timeline

In order to develop these kinds of solutions, I believe that we need to have a working knowledge of deep learning and neural networks (also referred to as Artificial Neural Networks or ANNs).

What is Deep Learning?

But what is a neural network and what do I mean by “deep” in deep learning? A neural network is a layered structure that is “inspired” by the neurons in our brain.

Like the neurons in our brain, which are essentially a network of cells that get activated or inhibited (think 0 or 1) based on an incoming signal, a neural network is comprised of layers of cells that are activated or inhibited by an incoming signal. As we will see later, in the case of a neural network, the cells can store any value (instead of just binary values).

Despite the similarities, it is unclear whether neurons in our brain work in a similar way to the neurons in neural networks. Thus, I emphasized the word “inspired” earlier in this paragraph. Of course, that has not stopped the popular press from painting a magical picture around deep learning (now you know not to buy into this hype).

We still need to answer our question about the meaning of the word “deep” in deep learning. As it turns out, a typical neural network is multiple levels deep – hence the reference to “deep” in deep learning. It’s that simple!

Let us change gears and look briefly into what we mean by “learning.” We will use the “Hello World” equivalent for neural nets as the basis for our explanation. This is a neural network that can identify a handwritten digit (such as the one shown below).

a handwritten 9

It’s All About a Layered Network

Input Layer

Let’s think of a neural network as a giant data transformation engine that takes a handwritten image as input and outputs a number between 0-9. Under the cover, a handwritten image is really a matrix of, say, 28 X 28 cells, with each cell’s value ranging from 0-255. A value of 0 represents white and a value of 255 represents black. So altogether we have 28 X 28 (784) cells in the input layer.

Hidden Layer

Now let us, somewhat arbitrarily, add a layer underneath the input layer (because this layer is underneath the input layer, it is referred to as a “hidden” layer). Furthermore, let us assume that we have 512 cells in the hidden layer (compared to 784 cells in the input layer). Now imagine that each cell in the input layer is connected to each cell in the hidden layer – this is referred to as densely connected layers.

Each connection, in turn, has a multiplier (referred to as a weight) associated with it that drives the value of the cell in the next layer as shown. To help visualize this, please refer to the diagram below: The lines in red show all the incoming connections coming into the topmost cell of the hidden layer. The numbers decorating the incoming connections are the associated weights that drive the value of cells in the hidden layer.

a diagram of the neural network

For example, the topmost cell in the hidden layer gets a value of .827 – we got this value by adding the weight of each input cell and multiplying it by a weight (i.e. 1 * .612 + 1 * .215). Stated differently, we have calculated the value of the cell in the hidden layer using the weighted sum of all the incoming connections.

Weighted Sum, Bias and Activation Function

The weighted sum calculation is rather simplistic. We want this calculation to be a bit more sophisticated, so we can conduct more complex transformations as the data flows from one layer to next (after all, we are trying to teach our network the non-trivial problem of identifying handwritten digits).

In order to make the weighted sum calculation a bit more sophisticated, we can do a couple of things to help us control how data flows to the following layer:

  1. We can introduce a notion of threshold associated with the weighted sum – if the weighted sum is below a certain threshold, we will consider the next neuron not to be activated. An easy way to incorporate the notion of the threshold in our weighted sum calculation is to add an equivalent constant value to it (referred to as a bias).
  2. We introduce the notion of an activation function. Activation functions are mathematical functions like Sigmoid and ReLU that can transform the output of the neuron. For example, ReLU is simply a max operation that returns a value of 0 if the input value is a negative number; else it simply returns the input value. Here is the C# code snippet to implement ReLU:

public double calcReLU (double x){return Math.Max(0,x);}

If you are wondering why the weighted sum calculation is simplistic and why we needed to add things like a bias and activation function, it is helpful to note that, ultimately, ML boils down to searching for the right data transformation function (consistent with our earlier statement that at the highest level, ML is really about transforming input data into output data).

For example, in the diagram below, our ML program has “learnt” to predict a value of y, given x, using the training set. The training set is made of a collection of (x,y) coordinates depicted as blue dots. The “learnt” data transformation (depicted as the red line in the diagram) is a simple linear data transformation.

diagram of the model

If the data transformation that the ML routine was supposed to learn was more complex than the example above, we will need to widen our search space (also referred to a hypothesis space). This is when introducing the concepts like bias and activation function can help. In the diagram below, the ML program needs to learn a more complex data transformation logic.

You are probably wondering how ReLU can help here – after all, the ReLU / max operation seems like another linear transformation. This is not true: ReLU is a non-linear transformation function. In fact, you can chain multiple ReLU operations to mimic a non-linear graph, such as the one shown in the diagram above.

Recognizing  Patterns

Before we leave this section, let us ponder for a moment about the weighted sum calculation and activation function. Based on what we have discussed so far, you understand that the weighted sum and the activation function together help us calculate the value of the cell in the next layer. But how does a calculation such as this help the neural net recognize handwritten digits?

The only way a neural network can learn to recognize digits is by finding patterns in the image, i.e., the number “9” is a combination of a circle alongside a vertical line. If we had to write a program to detect a vertical line in the center of an image, what would we do? We have already talked about representing an image as a 28 X 28 matrix, where each cell of the matrix has a value of 0-255. One easy way to determine if this image has a vertical line in the center of the image is to create another 28 X 28 matrix and set high values for cells where we are looking for the vertical line and low values for cells everywhere else.

Now if we multiply the two matrices, assuming our image did indeed have a vertical line in the center of the image, we will end up with a matrix that will have high values for cells that make up the vertical line. Hopefully, you have begun to develop an intuition on how the neural network will learn to recognize handwritten digits. (Note – on its own, we are not providing it a cheat sheet of patterns for each digit.)

Output Layer

Finally, let’s add an output layer that has only 10 cells. Why just 10 cells? Remember that the purpose of a neural net is to predict the digit that corresponds to a handwritten digit. This means we have 10 possibilities (0-9). Each cell of our output layer will contain a probability score of what our neural network thinks is the handwritten digit (the sum of all the probabilities cannot exceed 1). Such an output layer is referred to as a SoftMax layer.

With the necessary layers in place, let us review the layers in our neural network model as shown in the table below. The input layer is a 28 X 28 matrix of pixels. The hidden layer is comprised of 512 cells and the output layer has 10 cells. The third column of the table depicts the total number of weight parameters that can be adjusted for the network to learn.

Table with layers and values

What is “Learning”?

Great! We now have a neural network in place! But why is it even reasonable to think that a layered structure can learn to recognize handwritten digits?

This is where learning comes in!

We will start out by initializing the weights to some random values. Then we will go about calculating the values of cells in the hidden layer, and subsequently, the values in the output layer – this process is referred to as a forward pass. Since we initialized the weights to random values, we cannot expect the output to be a meaningful value.

Of course, our predicted output is going to be wrong. But how wrong? Well… we can calculate the difference between our incorrect output and “correct” output. I suspect that you’ve heard that in order to conduct the learning, we need a set of training examples – in our case, images of handwritten digits and the “correct” answer (referred to as “labels”). Fortunately for us, folks behind the MNIST database have already created a training set that is comprised of images of 60,000 handwritten digits and the corresponding labels. Here is an image depicting sample images from the MNIST database.

a collection of handwritten digits

Note: When working on a real-world problem, you probably won’t find a readily available training set. As you can imagine, collecting the requisite sized training set is one of the key challenges you will face. More training data is almost always better when it comes to neural network learning. (Although, there are techniques like data augmentation that can allow you to get by using a smaller training set.)

Another real-world challenge is the required data wrangling to transform the data into a format that is conducive for feeding into a neural network. The process of using knowledge of your context to transform the data into a format that makes neural network learning work better is known as feature engineering. One of the strengths of neural networks is that it can conduct its own feature discovery by itself (unlike other forms of machine learning). We don’t have to tell the neural network that the digit 9 is made up of a circle and a vertical line.

That said, any data transformation you can undertake in order to ease the learning will lead to a more optimal solution (in terms of resource and training set requirements).

Since we know the correct answer to our “identify the handwritten digit” question, we can determine how wrong we were with the output. But rather than using the simple difference between the two, let us be a bit more mathematically sound and use the square of the differences between the incorrect and correct output values. This way we take into account negative values. This calculation (the squared difference) is referred to as a loss function.

A related concept is cost – it represents the sum of all the loss values added up over the entire training set – in this case, the mean of the squared differences. Using the cost function, we iterate repeatedly, adjusting the weights each time, in an attempt to reach a state with the minimal possible loss. This state of the neural network (collectively representing all the adjusted weights) when the loss has reached its minimum value is referred to as a trained neural net. Keep in mind that training will involve repeating the aforementioned steps for each image (60,000) in our training set, and then averaging the weights across these iterations.

Now we are ready to “show” our trained network images of handwritten digits that it has not seen before, and hopefully, it should be able to offer us a result with a certain level of accuracy. Typically, we use a test set (completely separate from the training set) to determine the accuracy of our trained network.

Adjusting the Weights

Let us get into the details on how weights get adjusted. Let us go with something simple. Assume one of the weights (w1) is set to 2.5. Increase one weight (to 3.5), make a forward pass, calculate the output and the corresponding loss (l1). Now decrease w1 (to 1.5), make a forward pass, calculate the output and the corresponding loss (l2). If l2 is less than l1, we know that we need to continue to decrease the weight and vice versa.

Unfortunately, our simple algorithm to change the weights is horribly inefficient – two forward passes just to determine the necessary weight adjustment for a single weight! Fortunately, we can use a concept that we learned in high school – “instantaneous rate of change.” If we can calculate the rate of change in the cost at the instant when weight w1 is 2.5, all we need to do is move weight w1 in the direction that lowers the loss by the amount that is proportional to the rate of change.

This concept, dear reader, is one of the key concepts in understanding neural networks. I deliberately did not get into the math behind the instantaneous rate of change, but if you are so inclined, please review this link.

Now we know how to change an individual weight. But what about our discussion of densely connected layers earlier? How do interconnected layers of neurons impact how we change individual weights?

We know that the value stored in a given cell is dependent on the weighted sum of all incoming connections from cells in the previous layer. This means that changing the value of a given cell requires the changing the weights on the incoming connections from the previously. So we need to walk backward, starting with the output layer, then to the hidden layer and ending up at the input layer, adjusting the weight at each step.

Another consideration when walking backward is to focus on weights that will give us the most bang for our buck, so to speak. Remember we know the instantaneous rate of change, which means we know what the impact a weight change will be on the overall reduction in the cost function, our stated goal. We also know which cells are most consequential in getting the neural network to predict the right answer.

For example, if we are trying to teach our network to learn about the digit ‘2’, we know the cell that contains the probability of the digit being 2 (out of the 10 cells in the output layer) needs to be our focus. Overall, this technique of walking backward and adjusting the weights recursively is known as backpropagation. As before, we won’t get into the math, but rest assured that another important concept from high school calculus provides us with the mathematical basis for calculating the overall instantaneous rate of change, based on the all the individual instantaneous rates of changes associated with each cell in the network. If you want to read more about the about the mathematical basis, please review this link.

Later, when we look at a code example, we will come across the use of a component called an optimizer that is responsible for implementing the backpropagation algorithm.

We now have some understanding of how the weights get adjusted. But up until now, we have talked about adjusting a handful of weights, when in fact, it is not uncommon to find millions of weights in a modern neural network. It is hard for any of us to visualize the instantaneous rate of change involved with millions of weights. One way to think about this is to imagine a very large matrix that stores weights and the instantaneous rate of change associated with these weights. Fortunately, with the power of parallelization that GPUs and highly efficient matrix multiplication functions possess, we can process these large number of calculations in parallel, in a reasonable amount of time.

Review of Deep Learning Concepts

Phew… that was a rather lengthy walkthrough of neural network essentials. Our hard work will pay off as we get into the code next. But before we get into the code, let us review the key terms we have discussed so far in this blog post.

Training Set: Collection of handwritten images and labels that our neural network learns from

Test Set: Collection of handwritten images and labels used for validating how well our neural network learnt

Loss: Difference between the predicted output and actual answer. Our network is trying to minimize this value as part of the learning

Cost: Sum of all the loss values added up over the entire training set – in our example, we used the mean of squared difference as the cost function

Optimizer: Optimizer implements a backpropagation algorithm for adjusting the weights

Densely Connected: A fully connected layer that transforms the input with the weights and outputs the results to the following layer

Activation Function: A function such as ReLU used for non-linear transformation

SoftMax: Calculates the probability for each output type. We used a 10-way SoftMax layer that calculated the probability of digit being 0-9

Forward Propagation: A forward pass that travels through the input layer & hidden layer until it produces an output

Backward Propagation: An algorithm that involves traveling backwards through the neural network to adjust the weights

Code Walkthrough

We will use the Keras library for our code sample. It turns out that the author of the book we referenced earlier (François Chollet) is also the author of this library. I really like this library because it sits on top of deep learning libraries like Tensor Flow and CNTK. Not only does it hide the complexity of the underlying framework, but it also provides a first-class notion of a neural network layer as a building block that all of us developers can appreciate. As you will see in the code example below, putting together a neural network becomes a simple matter of stacking these building blocks.

Keras building blocks

Let us review an elided version of a Python program that matches the neural network we discussed above. I don’t expect you to pick up every syntactic detail associated with the entire program. My goal is to map, at a high level, the concepts we talked about, to code. For a complete listing of code see this link. You can easily execute this code by provisioning an Azure Data Science Virtual Machine (DSVM) that comes preconfigured with an AI and data science development environment. You also have the option of provisioning the DSVM with GPU support.

// Import Keras
// Load the MNIST database training images and labels (fortunately it comes built into Keras)

// Create a neural network model and add two dense layers


// Define the optimizer, loss function and the metric

// It is not safe to feed the neural network large value (in our case integers
// ranging from 0-255. Let us transform it into a floating-point number between 0-1


// Start the training (i.e. fit our neural network into the training data)

// Now that our network is trained, let us evaluate it using the test data set


// Print the testing accuracy

// 97.86% Not bad for our very basic neural network

Not to denigrate what we have learned so far, but please keep in mind that the neural network architecture we have discussed above is far from the state of art in neural networks. In future blog posts, we will get into some of the cutting-edge advancements in neural networks. But don’t be disheartened, the neural net we have discussed so far is quite powerful with an accuracy of 97.86%.

Learn More About Neural Networks (But Don’t Anthropomorphize)

I don’t know about you, but it is pretty amazing to me that a few lines of code have given us the ability to recognize sloppily handwritten digits that our program has not seen before. Try writing such a program using the traditional programming techniques you know and love. Note however that our trained neural network does not have any understanding of what the digits mean (for example, our trained neural network will not be able to draw a digit, despite having been trained on 60,000 handwritten images)

I also don’t want to minimize the challenges involved in effectively using neural networks, starting with putting together a large training set. Additionally, the data wrangling work involved in transforming the input data into a format that is suitable for training the neural network is quite significant. Finally, developing an understanding of the various optimization techniques for improving the accuracy of our training can only come through experience (Fortunately, AutoML techniques are making the optimization much easier).

Hopefully, this blog post motivates you to learn more about neural networks as a programming construct. In future blog posts, we will look at techniques to improve the accuracy of neural networks, as well as review state of the art neural network architectures.

PaaS & Cloud-Native Technologies

If you have worked with Azure for a while, you’re aware of the benefits of PaaS, such as the ability to have the cloud provider manage the underlying storage and compute infrastructure so you don’t have to worry about things like patching, hardware failures, and capacity management. Another important benefit of PaaS is the rich ecosystem of value-add services like database, identity, and monitoring as a service that can help reduce time to market.

So if PaaS is so cool, why are cloud-native technologies like Kubernetes and Prometheus all the rage these days? In fact, not just Kubernetes and Prometheus, there is a groundswell of related cloud-native projects. Just visit the cloud-native landscape to see for yourself.

Key Benefits of Cloud-Native Architecture

Here are ten reasons why cloud-native architecture is getting so much attention:

  1. Application as a first-class construct — Rather than speak in terms of VMs, storage, firewall rules, etc. cloud-native is about application-specific constructs. Whether it is a Helm chart that defines the blueprint of your application or a service mesh configuration that defines the network in application-specific terms.
  1. Portability — Applications can run on any CNCF certified clouds and on-premises and edge devices. The API surface is exactly the same.
  1. Cost efficiency — By densely packing the application components (or containers) on the underlying cluster, the cost of running an application is significantly more efficient.
  1. Extensibility model — Standards-based extensibility model allows you to tap into innovations offered by the cloud provider of your choice. For instance, using the service catalog and open service broker for Azure, you can package a Kubernetes application with a service like Cosmos DB.
  1. Language agnostic — Cloud-native architecture can support a wide variety of languages and frameworks including .NET, Java, Node etc.
  1. Scale your ops teams — Because the underlying infrastructure is decoupled from the applications, there is greater consistency for lower levels of your infrastructure. This allows your ops team to scale much more efficiently.
  1. Consistent and “decoupled” — In addition to greater consistency at the lower levels of infrastructure, applications developers are exposed to a consistent set of constructs for deploying their applications. For example, Pod, Service Deployment and Job. These constructs remain the same across cloud, on-premises and edge environments. Furthermore, these constructs also help decouple the developers from the underlying layers (Cluster, Kernel and Hardware layers ) shown in the diagram below.decoupling
  1. Declarative Model – Kubernetes, Istio, and other projects are based on a declarative, configuration-based model that support self-healing. This means that any deviation from the “desired state” is automatically “healed” by the underlying system. Declarative models reduce the need for imperative automation scripts that can be expensive to develop and maintain.
  1. Community momentum – As stated earlier, the community momentum behind CNCF is unprecedented. Kubernetes is #1 open source project in terms of contributions. In addition to Kubernetes and Prometheus, there are close to 500 projects that have collectively attracted over $5 B of venture funding! In the latest survey, (August 2018), the use of cloud-native technologies in production has gone up by 200% since Dec 2017.
  1. Ticket to DevOps 2.0 – Cloud-native combines the well-recognized benefits of what is being termed as “DevOps 2.0” that combines hermetically sealed and immutable container images, microservices and continuous deployment. Please refer to the excellent book by Victor Farcic.

Now that we understand the key benefits of cloud-native technologies, let us compare it to a traditional PaaS offering:

Attribute Tradition PaaS Cloud-Native as a Service
Portability Limited Advanced
Application as a first-class construct Limited (application construct limited to the specific PaaS service) Advanced construct including Helm, network and security policies
Managed offering Mature (fully managed) Maturing (some aspects of the cluster management currently require attention)
Stateful applications Advanced capabilities offered by the database as service offerings Some cloud-native support for stateful applications (However, cloud-native applications can be integrated with PaaS database offerings through the service catalog)
Extensibility Limited Advanced (extensibility includes Container Network Interface, Container Runtime Interface)

Azure & CNCF

Fortunately, Microsoft has been a strong supporter of CNCF, as they joined CNCF back in 2017 as a platinum member. Since then, they have made significant investments in a CNCF-compliant offering in the form of Azure Kubernetes Service (AKS). AKS combines the aforementioned benefits of a cloud-native computing with a fully managed offering – think of AKS as a PaaS solution that is also CNCF compliant.

Additionally, AKS addresses enterprise requirements such as compliance standards, integration with capabilities like Azure AD, Key Vault, Azure Files etc. Finally, offerings like Azure Dev Spaces and Azure DevOps greatly enhance the CI/ CD experience in working with cloud-native applications. I will be remiss not to talk about VS Code extension for Kubernetes that also brings a useful tooling to the mix.

Cloud-Native Use Cases

Here are few key use cases for cloud-native applications. Microservices are something you would expect, of course.  Customers are also being used to run Apache Spark on AKS.  There is also thinking around managing IoT Edge deployments right from within the Kubernetes environment. Finally, “Lift and shift to containers” – this use case is getting a lot of attention from customers as the preferred route for moving on-premises applications to the cloud. Please refer to our recent blog post on this very topic “A “Modernize-by-Shifting” App Modernization Approach” for more details!

Cloud-Native Scenarios

FREE HALF DAY SESSION: APP MODERNIZATION APPROACHES & BEST PRACTICES
Transform your business into a modern enterprise that engages customers, supports innovation, and has a competitive advantage, all while cutting costs with cloud-based app modernization.