As an IT leader, you understand a successful cloud transformation positions IT as a business enabler, rather than a curator of infrastructure. Adopting the cloud is more than simply moving your on-prem instance to a provider’s servers and going on with business as usual. The flexibility, scalability, and security of the cloud allows businesses to deliver value in ways that were not dreamed of outside of science fiction novels in earlier generations. Cloud transformation is about the whole system – people, processes, data, and tools. When cloud transformation is done right, it’s a true game changer. Getting it right requires you to focus on your people, not just technology enabling them. Here are tips to help you do that.

Connect the Dots

Whether you’re focused on adopting the cloud, modernizing your systems, or getting more from your data, helping your business solve problems and overcome challenges are the driving forces. People will need to work differently to achieve your desired results. If you’ve tried to change your own habits – working out, reading more, going to bed earlier – you know that influencing human behavior isn’t easy. To help people navigate these changes and thrive, it’s especially important to connect these dots:

  • How the solution will help employees solve problems and overcome challenges they face in their day to day work.
  • What people will need to do differently and what support will be available to help them do that.

Start with the Home Team, But Don’t End There

The first place to start is with the IT teams. Whether the solution includes provisioning firewalls to migrate an on-prem intranet to SharePoint Online, modernizing millions of lines of COBOL code and migrating subsystems into Microsoft Azure, harnessing cloud-native services and DevOps practices, unleashing data intelligence through cloud-based outage tracking systems that incorporate Power BI, or automating workflows with Power Apps, people from different IT teams will need to work together to get the right solutions in place. This means communication and collaboration across IT teams, as well as within teams, is more important than ever.

Ensuring that all your business’ IT teams understand how they are an important part of the solution and ensuring they have access to the support they need to perform successfully are critical tasks. However, teams outside of IT are also likely to be impacted, whether it’s HR needing to update policies or documentation as a result of the new tools, or the entire company’s workforce using new communication and collaboration tools.

Ensure You Have a Complete Solution

Take a closer look at whose work will be impacted, what the areas of impact are, and the likely degree of impact. This will help you manage risk by ensuring you have a complete solution and that you can wisely deploy resources. If you have accelerated the deployment of cloud-based collaboration tools in the wake of the COVID-19 pandemic and are proceeding with immediate implementation of tooling, you can use this guidance to determine the gaps in a complete solution and what’s needed to close the loop. Here are three questions to help you identify the impact that your solution needs to address:

Step 1 – Whose day to day work is impacted?

  • IT Teams
  • Employees
  • Business Units
  • Customers
  • Other Stakeholders

Step 2 – What are the areas impacted?

  • Roles
  • Processes
  • Tools
  • Actions/Work behaviors
  • Mindsets/views

Step 3 – What is the level of impact on the day to day work?

  • Low – Small change in one or two areas
  • Medium – Medium change in one area or multiple areas impacted
  • High – Significant change in one or more areas or small change but significant consequences if the change is not adopted well

The greater the level of impact, the more important it is to have enough support in place. How much support is enough? To answer that, take a closer look at the likely obstacles, then put support in place to clear the path.

Anticipate Obstacles and Proactively Clear the Path

With the impact clear, it’s time to anticipate obstacles that will be faced as people adopt the new roles, processes, tools, actions/work behaviors or mindsets/views they need for successful results to be achieved.

For example, let’s say your company is migrating to a central repository and communication platform. Employees will benefit from a more seamless work experience across devices and be able to access on-demand resources, get answers to their questions, and resolve issues faster. Employees will need to know how to find the information they need in a timely manner, and they will need to know whether to use e-mail, instant messaging, or post to a discussion channel for their specific business scenarios.

Most obstacles fall into one of four categories:

  • Knowing: Do those impacted know what is changing and why they are an important part of the solution?
  • Caring: Do they care about the problems the new tool, system or processes will help solve?
  • Norming: Do they know what is expected of them? Does their leadership (and other influencers) demonstrate through consistent words and actions that this is important?
  • Performing: Can they do what is expected of them? How will they get feedback? Are incentives aligned with the desired performance? Are there any new challenges that they are likely to face and have these been accounted for?

A complete solution anticipates these challenges and proactively builds in support by considering the experiences people have and the support they need specific to the business scenarios they are engaged in on a regular basis.

Conclusion

Wherever you are on your cloud transformation journey, make sure you are considering the experiences and support that people need to have in order to successfully navigate changes in roles, processes, and tooling to thrive. The sooner people start to thrive, the sooner your company gets its ROI with business problems solved and challenges overcome. Ultimately, a complete cloud transformation solution must be tech-fueled, but people-focused.

In January, AIS’ Steve Suing posted a great article titled “What to Know When Purchasing a COTS Product and Moving to the Cloud.” In his post, Steve discusses some things to consider such as security, automation, and integration. Springboarding off of those topics for this blog, I am going to discuss another major consideration: legacy data.

Many COTS implementations are done to modernize a business function that is currently being performed in a legacy application. There are some obvious considerations here. For example, when modernization a legacy app, you might ask:

  • How much of the business needs are met by purely out of the out-of-the-box functionality?
  • How much effort will be necessary to meet the rest of your requirements in the form of configuration and customization of the product or through changing your business process?

However, legacy data is a critical consideration that should not be overlooked during this process.

Dealing with Data When Retiring a Legacy System

If one of the motivators for moving to a new system is to realize cost savings by shutting down legacy systems, that means the legacy data likely needs to be migrated. If there is a considerable amount of data and length of legacy history, the effort involved in bringing that data into the new product should be carefully considered. This might seem trivial, with most products at least proclaiming to have a multitude of ways to connect to other systems but recent experience has shown me that the cost of not properly verifying the level of effort can far exceed the savings provided by a COTS product. These types of questions can help avoid a nightmare scenario:

  1. How will the legacy data be brought into the product?
  2. How much transformation of data is necessary, and can it be done with tools that already exist?
  3. Does the product team have significant experience of similarly legacy systems (complexity, amount of data, industry space, etc.) moving to the product?

The first and second conversations usually start by sitting down and talking to the product team and having them outline the common processes that take place to bring legacy data into the product. During these conversations, be sure both sides have technical folks in the room. This will ensure the conversations have the necessary depth to uncover how the process works.

Once your team has a good understanding of the migration methods recommended by the product team, start exploring some actual use cases related to your data. After exploring some of the common scenarios to get a handle on the process, jump quickly into some of the toughest use cases. Sure, the product might be able to bring in 80% of the data with a simple extract, transform, load (ETL), but what about that other 20%? Often legacy systems were created in a time long, long ago, and sometimes functionality will have been squished into them in ways that make the data messy. Also, consider how the business processes have changed over time and how the migration will address legacy data created as rules changed over the history of the system. Don’t be afraid to look at that messy data and ask the tough questions about it. This is key.

Your Stakeholders During Legacy Data Migration

Be sure those involved are not too invested in the product to decrease confirmation bias, the tendency to search for, interpret, and/or focus on information that will confirm that this is your silver bullet. This way they will ask the hard questions. Bring in a good mix of technical and business thought leaders. For instance, challenge the DBA’s and SME’s to work together to come up with things that will simply not work. Be sure to set up an environment in which people are rewarded for coming up with blockers and not demotivated by being seen as difficult. Remember every blocker you come up with in this early phase, could be the key to making better decisions, and likely have a major downstream impact.

The product of these sessions should be a list of tough use cases. Now it’s time to bring the product team again. Throw the use cases up on a whiteboard and see how the product team works through them. On your side of the table, be sure to include people that will be responsible for the effort of bringing the new system to life. With skin in the game, these people are much less likely to allow problems to be glossed over and drive a truly realistic conversation. Because this involves the tough questions, the exercise will likely take multiple sessions. Resist any pressure to get this done quickly, keeping in mind that a poor decision now, can have ripple efforts that will last for years.

After some of these sessions, both sides should have a good understanding of the legacy system, the target product, and some of the complexities of meshing the two. With that in place, ask the product team for examples of similar legacy systems they have tackled. This question should be asked up front, but it really cannot be answered intelligently until at least some of the edge cases of the legacy system are well understood. While the product team might not be able to share details due to non-disclosure agreements, they should be able to speak in specific enough terms to demonstrate experience. Including those involved in the edge case discovery sessions is a must. With the familiarity gained during those sessions, those are the people that are going to be most likely to make a good call on whether the experience being presented by the product team relates closely enough to your needs.

Using Lessons Learned to Inform Future Data Migration Projects

Have I seen this process work? Honestly, the answer is no. My hope is that the information I have presented, based on lessons learned from projects past helps others avoid some of the pitfalls I have faced and the ripple effects that could lead to huge cost overruns. It’s easy to lose sight of the importance of asking hard questions up front and continuing to do so in light of the pressure to get started. If you feel you are giving in to such pressure, just reach out to future you and ask them for some input.

And a final thought. As everyone knows, for complex business processes, COTS products don’t always perfectly align to your legacy system. There are two ways to find alignment. Customizing the system is the obvious route, although it often makes more sense to change business processes to align with a product you are investing in rather than trying to force a product to do what it was not designed for. If you find that this is not possible, perhaps you should be looking at another product. Keep in mind that if a product cannot be found that fits your business processes closely enough, it might make more financial sense to consider creating your own system from the ground up.

Power Platform is Microsoft’s newest pillar in its cloud platform stack, steadily growing in popularity as more and more organizations realize the capability of the tool. Put into the hands of experienced developers, the Platform can expedite the development of highly complex organizational applications. Provided to citizen developers, the Platform connects them to organizational data and allows them to automate personal and team workloads. At an organizational level, IT has insight into who is doing what where and for whom – and the ability to secure, manage, and govern it. It is a first-class platform.

So, why has it taken so long to be recognized as such? Part of the problem is a misunderstanding around what “low code” means.

When you hear the phrase “low code,” you shouldn’t be thinking “low thought” or “low quality.” Rather, think of low code as short-hand coding. The team at Microsoft has done a large chunk of the pre-work by setting up your Azure database (that’s right – the Power Platform is built on Azure!) so that you can focus more on the parts of your solution that are unique to your organization and less about the base architecture that is consistent between solutions.

The Platform offers several drag-and-drop features, many templates, and hundreds of premade data connectors you can utilize to get apps into minimum viable product (MVP) shape quickly. It also lets you use traditional coding to beef up these apps, allowing highly customizable experiences designed to fit your precise business use case. It democratizes the ability to create apps by having both ends of the spectrum covered, allowing a community to exist globally (and organizationally) between citizen developers who just want to make their jobs a little easier and professional developers whose full-time job is building.

It is this unique feature – the diverse community the Platform is capable of supporting – that really changes the game. The Platform is allowing organizations to rapidly scale their modernization efforts by leveraging resources both in and beyond their IT shops.

The base of most Power Platform solutions is built within the Common Data Service (CDS) – a data model that shapes your model-driven apps, is fed by your canvas apps, and provides a wealth of information to reports published with Power BI. Organizations can deploy multiple solutions to an environment’s CDS, meaning that you can package up a “base solution” – a solution that sets up the common entities that make up the core of your enterprise – and deploy it to all of your Platform environments as a starting point for your developers. Then each app can be packaged up as its own solution and stacked on top of this base, modifying it as needed and even furthering the expediency at which a new solution can be deployed.

The CDS not only contains the data model but also stores the data itself. That means that when you build multiple apps in one environment, you’re creating a single source of truth for all of the applications touching that CDS; pull out processes and data points specific to one business unit but allow them to share the data with other units in the background. Connect your CDS-built apps to other data sources (Azure SQL database, Outlook, or even competing software like SalesForce) using the hundreds of existing data connectors, or create custom connectors, to further utilize existing data and reduce duplicative sources (and the source control issues that go along with it).

Common Data Service

Expanding on the need for control of your organization’s data, the Power Platform offers a familiar experience for the system administrators of many organizations by using Azure Active Directory to authenticate users and the O365 Admin Center to manage licenses. Further, the management of the Platform is fortified by the installation and use of the (free!) Center of Excellence Starter Kit. This toolkit provides a starting point to gain organizational insight into the details of how the Platform is being used – locations, environments, flows, makers, connections, apps, and more can be administered with the kit. Drill into each application, tag it, track it, and monitor its usage. Customize the COE Starter Kit and bring a whole new light to what once might have been considered “rogue IT.”

So now that we understand that low code is a good thing, let’s recap what you’re getting with the Power Platform:

  • A way to maximize your resources – the low code features enable citizen developers to migrate their own work, and the pre-built features and “short-hand” coding allow your professional developers to focus on the most important and unique features of your organization’s workloads.
  • A system meant to scale – built on Azure and backed by Microsoft SLAs, this tool is meant for enterprise use and has the speed and reliability to back it. Enable your citizen developers to modernize their own workloads to speed up your cloud overhaul. Develop & leverage a custom, organization-wide base solution to start every new project with a bulk of known entities already installed and scale your implementations even faster.
  • A new level of visibility into your business – the Center of Excellence toolkit allows you to see all of the makers who are building, the apps they’re developing, the connections being made, and more, providing an entirely new window into (and set of control over) citizen development that might have once been seen as “rogue IT.”

To paraphrase Andrew Welch, Director of Business Applications at AIS and Principal Author of the PPAF, on the subject: some business challenges can be solved with custom software development. Some business challenges can be solved with COTS solutions. Everything else is a missed opportunity to transform and modernize in the cloud – for which low-code cloud transformation with the Power Platform is the answer.

AIS is a Microsoft Partner, the premier developer of customer Power Platform solutions, and the publisher of the Power Platform Adoption Framework.

READY TO START YOUR JOURNEY?
AIS can help you when beginning your enterprise adoption journey with Power Platform.

During a recent design session at a client site, our team had the opportunity to participate in something cool. We had the opportunity to create a custom DSL (Domain Specific Language) in YAML for an automation framework we wanted to build. This blog will help provide a high-level overview. The client wants to use Azure Automation to create a self-healing framework for their infrastructure. The environment is a mixture of both PaaS and IaaS, with certain VMs being used to host app and database servers. Unfortunately, because of potential performance issues, it is important to regularly perform healing actions on the environment, which so far is being done manually. Now you may be asking, “Why don’t they use something like Chef or Puppet?”. It is because these technologies were either in the process of being on-boarded or were not available. You may also ask, “Well, if you’re going to use Powershell, just use Powershell DSC!”. While I agree that Powershell DSC is powerful and could potentially power the engine for this application (explained below), DSC itself is not very user-friendly. One of the major factors for this session was to allow any user to edit or read the definitions, and YAML is a much more robust option for usability.

The design session itself was fascinating. I had never spent time creating my own configuration language complete with definitions and structure. It was a great learning experience! I had used YAML briefly when demoing out things like Ansible for personal projects, but never directly to solve an issue. YAML had always been another markup language for me. This usage of it, however, showed me the power of the language itself. The problem that we ended up solving with the below snippet was creating an abstraction layer for Azure Monitor Alerts.

After the session completed, we were left with something like this:
Create and Abstraction Layer

Now what?

We must take this YAML File and convert it into an Azure Monitor Alert. The first problem we ran into is that this file is a YAML File. How can I take these configuration values and convert them to be used in Powershell? Here is where Cloudbase’s powershell-yaml module comes into play. As we all know, Powershell was written on top of and created to be an extension for the .NET Framework, so people at Cloudbase created the wonderful powershell-yaml, a module that is a wrapper around the popular .Net Library YamlDotNet.

With powershell-yaml you can create something like this:
Converting the yaml File

In this example, I am not only converting the yaml file into a JSON file as an example, but I am returning the yaml as a PSObject. I find this much easier to use because of the ease of dot notation.

In this case, you can now reference variables such as this:
Corresponding Powershell cmdlets

I was able to follow up with the corresponding PowerShell cmdlets available in the Az Modules and programmatically create Azure Monitor Alerts using YAML.

The eventual implementation will look roughly like this:
Azure Alerts

Each Azure Alert will trigger a webhook that is received by the Engine that is running in Azure Automation. This Engine will then do all business logic to find out whether the piece of infrastructure in question needs healing. If all the previously defined reasons are true the automation runbook will perform the recovery action.

The exercise itself was eye-opening. It made me much more comfortable with the idea of designing a solution for our specific scenario rather than trying to find out and wait to find the product or framework that would solve the problem for us. Also, this PoC showed me the power of YAML and how it can turn something incredibly monotonous, like configuration values, into something that can be part of your robust solution.

Azure DevOps provides a suite of tools that your development team needs to plan, create, and ship products. It comes in two flavors:

  • Azure DevOps Services – the SaaS option hosted by Microsoft.
  • Azure DevOps Server – the IaaS option hosted by you.

When comparing the two to decide which option enables your team to deliver the most value in the least amount of time, Azure DevOps Services is the clear winner, but velocity alone is not the only consideration for most government teams. The services you use must also be compliant with standardized government-wide security, authorization, and monitoring requirements.

Azure DevOps Services are hosted by Microsoft in Azure regions. At the time of this writing, you do not yet have the option to host Azure DevOps in an Azure Government region so you must use one of the available Azure Commercial regions. As a public service, it has also has not yet achieved compliance with any FedRAMP or DoD CC SRG audit scopes. This may sound like a non-starter, but as it states on the FedRAMP website, it depends on your context and how you use the product.

Depending on the services being offered, the third-party vendor does not necessarily have to be FedRAMP compliant, but there are security controls you must make sure they adhere to. If there is a connection to the third-party vendor, they should be listed in the System Security Plan in the Interconnection Table.

This is the first of two blog posts that will share solutions to common Azure DevOps Services concerns:

  1. In “Azure DevOps Services for Government: Access Control” (this post), I will cover common access control concerns.
  2. In “Azure DevOps Services for Government: Information Storage”, I will address common concerns with storing information in the commercial data centers that host Azure DevOps Services.

Managing User Access

All Azure DevOps Organizations support cloud authentication through either Microsoft accounts (MSA) or Azure Active Directory (AAD) accounts. If you’re using an MSA backed Azure DevOps organization, users will create their own accounts and will self-manage security settings, like multi-factor authentication. It is common for government projects to require a more centralized oversight of account management and policies.

The solution is to back your Azure DevOps Services Organization with an AAD tenant. An AAD backed Azure DevOps organization meets the following common authentication and user management requirements:

  • Administrators control the lifecycle of user accounts, not the users themselves. Administrators can centrally create, disable, or delete user accounts.
  • With AAD Conditional Access, administrators can create policies that allow or deny access to Azure DevOps based on conditions such as user IP location.
  • AAD can be configured to federate with an internal identity provider, which could be used to enable CAC authentication.

Controlling Production Deployment Pipelines

The value added by Continuous Delivery (CD) pipelines includes increasing team velocity and reducing the risks associated with introducing a change. To fully realize these benefits, you’ll want to design your pipeline to begin at the source code and end in production so that you can eliminate bottlenecks and ensure a predictable deployment outcome across your test, staging, and production environments.
It is common for teams to grant limited production access to privileged administrators, so some customers raise valid concerns with the idea of a production deployment pipeline when they realize that an action triggered by a non-privileged developer could initiate a process that ultimately changes production resources (ex: deployment pipeline triggered from a code change).

The solution is to properly configure your Azure DevOps pipeline Environments. Each environment in Azure DevOps represents a target environment of a deployment pipeline. Environment management is separate from pipeline configuration which creates a separation of duties:

  • Team members who define what a deployment needs to do to deploy an application.
  • Team members who control the flow of changes into environments.

Example

A developer team member has configured a pipeline that deploys a Human Resource application. The pipeline is called “HR Service”. You can see in the YAML code below, the developer intends to run the Deploy.ps1 scripts on the Production pipeline Environment.

Production Environment

If we review the Production environment configuration, we can see that an approval check has been configured. When the deployment reaches the stage that will attempt to run against the production environment, a member of the “Privileged Administrators” AAD group will be notified that deployment is awaiting their approval.

Approvals and Checks

Only the Privileged Administrators group has been given access to administer the pipeline environment, so the team member awaiting approval would not be able to bypass the approval step by disabling it in the environment configuration or in the pipelines YAML definition.

Security Production

By layering environment configuration with other strategies you’ll establish the boundaries needed to protect your environment but will also empower your development team to work autonomously and without unnecessary bottlenecks. Other strategies to layer include:

  • Governance enforces with Azure Policies
  • Deployment gates based on environment monitoring
  • Automated quality and security scans into your pipeline

It is also important for all team members involved with a release to be aware of what is changing so that you are not just automating a release over the wall of confusion. This is an example of how a product alone, such as Azure DevOps, is not enough to fully adopt DevOps. You must also address the process and people.

Deployment Pipeline Access to Private Infrastructure

How will Azure DevOps, a service on a commercial Azure region, be able to “see” my private infrastructure hosted in Azure Government? That’s usually one of the first questions I hear when discussing production deployment pipelines. Microsoft has a straight-forward answer to this scenario: self-hosted pipeline agents. Azure DevOps will have no line of sight to your infrastructure. The high-level process to follow when deploying an agent into your environment looks like this:

  • Deploy a virtual machine or container into your private network.
  • Apply any baseline configuration to the machine, such as those defined in the DISA’s Security Technical Implementation Guides (STIG).
  • Install the pipeline agent software on the machine and register the agent with an agent pool in Azure DevOps.
  • Authorize pipelines to use the agent pool to run deployments.

With this configuration, a pipeline job queued in Azure DevOps will now be retrieved by the pipeline agent over 443, pulled into the private network, and then executed.

Azure DevOps Configuration

Conclusion

In this post, I’ve introduced you to several practices you can use to develop applications faster without sacrificing security. Stay tuned for the next post in this series where we will discuss common concerns from Government clients around the storage of data within Azure DevOps.

As application developers, it’s our responsibility to ensure that the applications we create are using credentials and other secret configuration values in a secure way. Oftentimes, this task is overlooked in the pursuit of our primary concern: building new features and delivering business value quickly. In some cases, this translates into developers tolerating flat-out unsafe practices in the name of convenience, such as hardcoding secrets into the application source code or sharing secrets with team members via insecure communication channels and storing them on their development machines.

Fortunately, Microsoft provides a solution to this problem that should be attractive to both security experts and developers, known as “managed identities for Azure resources” (formerly “Managed Service Identities”). The idea is pretty simple: associate an Azure AD security principal* to your Asp.Net Core Web App and let it use this ‘identity’ to authenticate to Azure Key Vault and pull secrets into memory at runtime. Microsoft provides the glue to make all of this easy for developers: on the programming side, they provide a simple library for your Asp.Net Core app to pull the secrets from Key Vault (as demonstrated here), and on the hosting side they implement the mechanisms that make the identity available to the app’s runtime via first-class support in Azure hosting environments and local development tools.

For those unfamiliar, a security principal is any kind of digital ‘identity’ that can be authenticated and configured with permissions that authorize it to access Azure resources. Examples include a user’s personal login, an AD group, or a service principal. This is also known as App Registrations in Azure. They allow you to create a custom identity and credentials just for applications or other automated processes, so they can be granted access to Azure resources they need to interact with.

So, what’s to be gained with this approach, and what are the tradeoffs? There are two audiences that have a stake in this:

  • The business stakeholders and security team that place a high priority on protecting applications and user data from exposure
  • The developers that just want to make sure they can stay productive and spend less time worrying about how configuration values are provided. I’ll address these groups and their distinct concerns separately.

The Security Perspective

There are numerous security benefits that come with this approach. Most critically, there are far fewer points of exposure for your secrets. The reliance on developers to do the right thing and manage secrets responsibly is almost entirely removed, to the point where developers would have to go out of their way to do the wrong thing. Another benefit is the administrative access control built into Key Vault, which makes it easy to manage who should and shouldn’t be able to run the app and access secrets.

We will start with how this approach limits the exposure of your secrets. Without managed identity and Asp.Net Core Key Vault configuration, you are directly responsible for making your secrets available to your app, whether it’s hosted or running locally. A hosted app, for example, one running in Azure App Service, means configuring the PaaS App Settings or modifying the appsettings.json file that you deploy with your app binaries. The secrets must be put there by the process that regularly builds and deploys your application. It also needs to store and retrieve these secrets from somewhere, which could be Key Vault, a release variable, or some other data store, maybe even just a VM or user’s file system. Local development also spreads the surface area of secret exposure. In the best case, you might pull these onto the developer’s machine using a script that stores them in the dev’s file system, but too often people will take the path of least resistance and send these to each other over email, chat, or, even worse, hardcode them into source control.

In a managed identity world, the app simply reaches out to Key Vault for these secrets at runtime. This trims out several problematic points of exposure:

  1. No more accessing these credentials from the deployment pipeline where they might accidentally get captured in logs and build artifacts, and where they may be visible to those with permission to manage deployments.
  2. If a person is tasked with running your deployment scripts directly (to be clear – not ideal) they wouldn’t need access to app secrets to do a code deployment.
  3. No more storing these credentials in persistent storage of the app runtime host, where they can be inspected by anyone with management access to the host.
  4. No more spreading secrets across developer’s local devices, and no more insecure transmission of secrets on channels such as email or chat. It also makes it easy to avoid bad habits like hardcoding secrets into the app and checking them into source control.

Another benefit of this approach is that it doesn’t rely so heavily on developers and operations folks being mindful and responsible about security. Not only can they avoid insecurely distributing them amongst teammates, but they also don’t have to worry about removing them from their local machines or VM’s when they no longer need them, because they are never stored. Of course, developers always should be mindful and responsible for security, realistically things don’t always work out that way. Developers frequently overlook security concerns while focusing on being productive, and often people are simply under-educated about security. Any opportunity to improve security via architecture and design, and to make humans less capable of doing the wrong thing is a win.

Those with a focus on security will also appreciate the level of access control that is provided by Key Vault. Access to secrets is not managed via typical Azure RBAC (Resource Based Access Control). Instead, access policies are created to grant specific permissions for each user, service principal, or group. You can grant specific kinds of access, such as reading or editing/adding secrets. This can make Key Vault serve as a control center for deciding who should be allowed to run the app for a given environment. Adding a new team member or granting temporary access to debug a higher environment is as easy as adding a user to a Key Vault access policy that allows reading secrets only, and revoking access is as easy as removing them. See here for more info on securing access to Key Vault.

The Developer Perspective

Developers may have concerns that a centralized configuration approach could slow things down, but let’s look at why that doesn’t have to be the case, and why this can even improve velocity in many cases. As we’ll see, this can make it super easy to onboard new team members, debug multiple environments, regenerate keys due to recycling or resource recreation, and implement a deployment process.

We will start with onboarding. With your app configured to use managed identity and key vault authentication, onboarding a new team member to run and debug the app locally simply involves adding them to an access policy granting them the permission to read keys from the key vault. An even easier approach is to create an AD group for your developers and assign a single Key Vault access policy to the entire group. After that, they just need to login to the subscription from their personal machine using Visual Studio or the Azure CLI. Visual Studio has this support integrated and will apply when you start your app from there, and the Azure CLI extends this support to any other IDE that runs the app using the dotnet CLI, such as VS Code. Once they have been granted authorization and logged in, they can simply start the app, which will retrieve the secrets from Key Vault using their permissions. If this team member were to eventually leave the team, they can have their access revoked by removing their access policy. They will have nothing to clean up because the secrets were never stored on their computers, they only lived in the app runtime memory.

Another benefit of centralizing secrets when using shared resources is in situations where secrets may often change. Particularly in a development environment, you may have good reason to delete resources and redeploy them, for example, to test an infrastructure deployment process. When you do this, secrets and connection strings for your resources will have changed. If every developer had their own copy of the secrets on the machine, this kind of change would have broken everyone’s local environments and disrupt their work until they’ve acquired all the latest secrets. In a managed identity scenario, this kind of change would be seamless. The same benefit applies when new resources are added to your infrastructure. Dev team members don’t need to acquire the new connection secrets when they get the latest code that uses a new service, the app will just pull them from Key Vault.

Another time secrets may change is when they expire or when you intentionally rotate them for the sake of security. Using a key vault can make it significantly easier to implement a key rotation strategy. The key vault configuration provider can be configured to pull app secrets once at app start time (which is the default) or at a regular interval. Both can be part of a secret/key rotation strategy, but the first requires orchestrating an app restart after changing a secret, which isn’t necessary with the second approach. Implementing key rotation support in your app is fairly straight forward: most Azure resources provide two valid keys at a time to support rotation. You should store both keys for each service in Key Vault, but only use one of them in your app until it becomes invalid. Once your client hits an auth error, you should catch that exception, set the other key as the actively used key, and replay the request. Using approach 2, configure the Key Vault config provider to refresh on an interval, maybe 5 or 10 minutes, and then have an external process (Azure Automation Runbooks are a recommended solution for this) reset only one key at a time. If both keys are cycled at the same time, your app config won’t refresh fast enough to get the new keys and will start to fail. By doing one at a time, you ensure having at least one valid key available to your app at any given time.

Another way that this can improve developer agility is that you can easily change the environment you target with a simple configuration change. For example, let’s say some pesky issue is popping up in your UAT environment that isn’t showing up anywhere else, and you’re tearing out your hair looking through logs trying to understand it. You’re at the point where you’d give your left foot to just run the app locally targeting that environment so you can attach a debugger and step through the problematic code. Without using managed identity and the key vault configuration provider you would have to copy the secrets for that environment to your local computer. This is gross enough that you should probably seek any other option before resorting to it. However, if you were using managed identity and key vault, you could simply reconfigure the name of the key vault you want your local app to use with the one for the target environment and create a temporary access policy for yourself. As a good practice, you should still revoke your access afterward, but at least you have nothing sensitive on your local device to clean up.

Finally, let’s talk about the benefits of using this approach from the perspective of building a deployment pipeline. Specifically, the benefit is that you have one fewer thing to implement and worry about. Since secrets are centralized in the key vault and pulled during app runtime, you don’t need to have your process pull in the secrets from wherever you store them, then pave them into an appsettings.json file, or assign them as PaaS-level environment variables. This saves you time not having to code this behavior, and it also saves you time when something breaks because there’s one fewer place where something could have gone wrong. Having your app go directly to key vault streamlines the configuration and creates fewer opportunities to break things. It also has the advantage that you don’t need to run a full app deployment just to update a secret.

Counter Arguments

This may sound good so far, but I suspect you may already have a few concerns brewing. Maybe you’re thinking some of the following: Do I have to start keeping all my configuration values in Key Vault? Doesn’t this represent additional configuration management overhead? Won’t I have conflicts with other team members if I need to change secret values to personalize my local environment? Doesn’t this create a hard dependency on an internet connection, meaning I won’t be able to run a local environment fully offline? All of these are valid questions, but I think you’ll see that they all have acceptable and satisfying answers.

So, does this mean that Key Vault needs to become the singular place for all app configurations, both secret and non-secret? If we only put secrets there, then don’t many of the above arguments about the benefits of centralization become moot, since we still need to do distributed config management for non-secret values? Azure’s answer to this question is Azure App Configuration, a centralized app configuration service that gives you a nice level of control over non-secret configuration settings for your app, including cool features like config value versioning and feature flags. I won’t go too deep into the details of this service here, but it’s worth noting that it also supports managed identity and can integrate with your app in the same way as Key Vault. However, I’ll also offer the suggestion that you can incorporate App Configuration on an as-needed basis. If you are dealing with a small app with less than 10 environment-specific settings, then you might enjoy the convenience of just consolidating all your secret and non-secret values into Key Vault. The choice comes down to preference, but keep in mind that if your settings are changing semi-often or you expect your app to continue adding new config settings, you may get tired of editing every config using Key Vault’s interface. It’s tailored for security, so it should generally be locked down as much as possible. It also doesn’t have all the features that App Configuration does.

Regarding configuration management overhead, the fact is that, yes, this does require creating/managing a Key Vault service and managing access policies for dev team members. This may sound like work you didn’t previously have, but I assure you this kind of setup and ownership is lightweight work that’s well worth the cost. Consider all the other complexities you get to give up in exchange: with centralized config management, you can now do code-only app deployments that can ignore configuration management entirely. That makes it faster and easier to create your deployment process, especially when you have multiple environments to target, and will give you high marks for security. As we also mentioned, centralizing these config settings makes it simpler to onboard new team members and possibly to iterate on shared infrastructure without breaking things for the team.

You may also be concerned that sharing your configuration source will result in a lot of stepping on toes with your team during development. But consider this: nothing is stopping you from using the same kind of local environment configuration approaches that developers already use in addition to Key Vault. Asp.Net Core’s configuration system is based on the idea of layering configuration providers in a stack, where the last-in wins. If you want to allow your developers to be able to override specific values for development purposes, for example, to point at a personal database instance (maybe even a local database, like SQL Server or the Cosmos DB Emulator), you can still pass those as environment variables, in appsettings.Development.json, or as ‘dotnet user-secrets’. This doesn’t necessarily defeat the purpose of centralizing secret or config management. The benefits of centralization apply most to shared resources. If you want to use a personal resource, there’s no harm in personalizing your config locally. An alternate approach to personalization is to provide your own complete set of resources that make up an environment in Azure. Ideally, you already have a script or template to create a new environment easily, and if you don’t, I strongly recommend it, in which case you’ll get your own Key Vault as well, and you can simply point your local app at it.

Lastly, I’d like to address the question of whether this makes it impossible to do fully offline local development. There are a couple of considerations here:

  1. How to target local services instead of live-hosted ones
  2. Overcoming the fact that the Key Vault configuration provider relies on an internet connection.

The first is handled the same way you would handle configuration personalization, by overriding any config settings in something like appsettings.Development.json or ‘dotnet user-secrets’ to target your local database or Azure service emulator. The second is relatively simple, just put the line of code that configures Key Vault as a config provider within an ‘if’ condition that checks to see if you are running in a development environment (see a sample approach here). This is assuming that Key Vault is truly your only remaining dependency on an internet connection. If it seems strange to hear me recommend disabling Key Vault after advocating for it, consider again that the benefits of centralized configuration apply most to using shared resources, so if you are designing to support an entirely local development environment then using Key Vault becomes unnecessary when running in that mode.

Using centralized configuration services like Key Vault via managed identity requires a different mindset for developers, but it comes with clear advantages, especially when it comes to limiting the exposure of your application secrets. This kind of solution is an absolute win from a security perspective, and it has the potential to considerably improve your development team’s experience as well. Due to Asp.Net Core’s pluggable configuration system, it’s easy to apply to existing projects, especially if you’re already storing secrets in Key Vault, so consider looking at how you could incorporate it into your existing projects today, and don’t miss out on the chance to try it in your next greenfield project. Your security advocates and fellow developers just might thank you.

If you are like me, you have used cloud services in a limited fashion to create VM’s for testing or perhaps you have used them extensively. You’d also like to gain an understanding of the broader group of services offered by cloud providers. In my situation, this was due to the recent attainment of an Engagement Manager position and my desire to help AIS expand our business through the development of new opportunities. I realized that I needed to have at least a top layer understanding our offerings in order to realize potential use cases AIS could present to solve problems, more cost-effective options to current solutions, and develop completely new solutions to improve client business. It was obvious to start with Microsoft’s Azure and Amazon’s AWS platforms, being that these are the top focus of AIS and the industry as a whole.

What was not obvious, was where to start. Both platforms are not only extremely broad but also moving targets. I needed to find a way to dip into this process without drowning in all the information, in addition to holding the responsibilities in my day job. I looked at classroom training options, YouTube videos, and continued researching until I stumbled upon two paths. These paths not only provided a nice prepackaged set of materials, but I could complete at my own pace, at home, and they resulted in certifications. I will get to the details, but first a word about certifications.

I am sure many of you will be rolling your eyes when you read the “certifications” aspect of that second to the last sentence. Yes, certifications are not as valuable an indicator of a person’s skills and knowledge in an area as real-world experience. However, they provide the following benefits in order of least to most important:

  1. Provide a good starting point for someone that has no current projects in an area.
  2. Fill knowledge gaps that even a person with experience in an area has, especially in those services or techniques that are not used often.
  3. Provide value to AIS in maintaining various statuses.
  4. Provide a potential client with proof that you at least have an understanding of the basics.
  5. Most importantly, they result in a $500 bonus from AIS, and reimbursement of testing and training costs!

The paths I found are the Microsoft Azure Fundamentals learning path and certification and the Amazon AWS Cloud Practitioner training and certification. The training for both of these includes videos with the Azure path including an estimated ten hours of content and the AWS training about five hours. The Azure path estimates were spot on, and the AWS training took a bit longer, due to my complete lack of experience with the platform.

Microsoft Azure Fundamentals

This path included videos, reading, hands-on experience, and quick knowledge checks. It can be completed with an Azure account that you create just for the training or an account linked to the AIS subscription if you have one. Both the reading and videos provide just enough information, but not get bogged down in the minutia. The only thing I had done with Azure prior to the training created a few VM’s to set up SharePoint environments. I had done that years ago, but I didn’t do that much within those environments.

For me, most of the content was new. I believe if I had a more in-depth experience, the training would have filled in gaps with specific details.

These were the topics I found either completely new or helpful in understanding how to look at and/or pitch Azure services to clients:

  • Containers, app services, and serverless options and how they work
  • Reducing latency with the traffic manager
  • Azure policies and tags to enforce standards
  • Review of data centers, region pairs, geographies, availability zones
  • Various was to predict costs and manage costs such as calculators, Cost Manager, and Azure Advisor

The training took me probably two-thirds of the estimated time, after which I went through the knowledge checks for each section once more. After that, I spent maybe an hour reviewing some things from the beginning. From there, I took an exam and passed. The exam process was interesting and can be done from home with some software that enables someone to watch you. Prior to the exam, you are required to show the person the entire room and fix anything that might enable you to cheat.

After I completed the certification process, I submitted the cost of the exam ($100) as an expense as well as submitted my request for a certification bonus. I received both in a timely manner. See links at the end of this post for materials concerning reimbursements and bonuses. Don’t forget approval from your EM/AE prior to incurring any costs for which you might want reimbursement and to submit your updated certifications spreadsheet to the AIS PI Team.

AWS Cloud Practitioner

This path exclusively contains videos. In my opinion, the content is not as straight forward as the Azure Fundamentals content and the videos cannot be sped up, which can be very frustrating. The actual content was a bit difficult to find. I have provided links at the conclusion of this post for quick reference. Much of the video content involves Linux examples, so Putty and other command-line tools were used. This added a further layer of complexity that I felt took away from the actual content (do I really need to know how to SSH into something to learn about the service?).

As far as content, everything is video, there is no reading, hands-on examples, and knowledge checks. I felt the reading in the Azure path broke things up. The hands-on exercises crystalized a few things for me, and the knowledge checks ensured I was tracking. I would like to see Amazon add some of these things. That being said, the videos are professionally done and included helpful graphics. With zero experience with AWS, I am still finding that I am able to grasp concepts and the videos do a decent job of presenting use cases for each service.

My biggest complaint is the inability to speed up videos that are obviously paced for the lowest common denominator and I find admittedly ADD attention waning often. Something I found that helps is taking notes. This allowed me to listen, write and not get bored.

Amazon provides a list of recommended prep (see links below) that includes self-paced training, a one-day classroom option, exam guide, a list of four base white papers and links to many others, practice exams, as well as a link to the schedule certification exam. I scanned the whitepapers. They all looked like they were useful, but not necessary to knock out the exam. I say this with confidence as I was able to pass the exam without a detailed review of the whitepapers. My technique was to outline the videos, then review them over the course of a couple weeks.

Summary

Whether you are a budding developer or analyst wishing to get a broad overview, a senior developer that wants to fill gaps, or a new EM like me who wants a bit of both, the Microsoft Azure Fundamentals learning track/certification and Amazon AWS Cloud Practitioner training/certification is a good place to start. AIS will cover any costs and provide you with some additional scratch for your effort. Obtaining these certifications also improves AIS standings with providers, clients, and the community as a whole. It also greatly improves your value to clients, meets the criteria of certain AIS Career Paths and Competencies, and who knows, you might learn something!

Links:

  1. Azure Fundamentals Learning Path: https://docs.microsoft.com/en-us/learn/paths/azure-fundamentals/
  2. Azure Fundamentals Cert: AZ900 Microsoft Azure Fundamentals Exam.
  3. AWS Cloud Practitioner Preparation and Cert:  AWS Recommended Prep
  4. Training Reimbursements and Certification Bonuses: https://appliedis.sharepoint.com/sites/HR/Pages/Additional-benefits.aspx
  5. Submitting certification inventory to PI Team: Reach out to AIS – Process Improvement Team (ais-pi-team@appliedis.com) for more info.
While checking out one of the automated messengers a coworker created, we had an idea. Why not use Azure to help with daily tasks or streamline routine daily tasks? The logic apps listed here take about 15-20 minutes at most to create and go from easiest to hardest to setup. Listed below is what you will need for the app before listing the steps. Keep in mind that while Logic Apps are available on Azure Gov, you might need to talk with a supervisor before implementing these there.

Completely new to Logic Apps? Click here to create your first one!

SharePoint Item Tracker

What you’ll need:

  • Office 365 account, with Teams, enabled.
  • SharePoint access to the desired list
  • The Completed App looks like:

Completed App

Steps:

  1. Start with a blank Logic App.
  2. Select the SharePoint Trigger “When an item is created”. This will require you to sign in with your Office 365 credentials.
  3. You’ll see a box like this. Rename the title by clicking the three dots in the top right corner.
    When An Item is Created
  4. Select the Site Address of the SharePoint site with the desired list. The dropdown will be populated with all the available sites on the SharePoint domain. Select the List Name from the dropdown of SharePoint lists. If the list you’re looking for isn’t there, it might be hiding on a different site. Set the Interval and Frequency to “1” and “Day”, respectively.
  5. Add a new action to the Logic App and find the action “Post your own adaptive card as the Flow bot to a channel” (As of this writing, this action is still in preview). After signing in again to your Office 365 account, you’ll see this box. (Again, the dots in the top right corner will allow you to change the name)
    Notify the Channel
  6. Add the Team ID and Channel ID. These should correspond with where you want to send the notification to.
  7. In the Message Field, you can add the message to send to the channel. You also have the option of adding Dynamic Content, which can add the Name of the item, a link to the Item, or other various properties.
  8. Hit Save in the top left to save the Logic App.

Twitter Tracker

What you’ll need:

  • Office 365 account
  • Twitter Account
  • The Completed App looks like:

Twitter Tracker

Steps:

  1. From the empty Logic App, scroll down to find the “Email yourself about new Tweets about a specific keyword via Office365”
  2. Click “Use this template”
  3. You’ll be asked to connect to the following: Twitter, Office 365 Users, and Office 365 Outlook.
  4. Click continue and include the desired keyword.
  5. Save the logic app by clicking the Save button in the upper right corner.

Event Time-boxer

This is a useful logic app to keep two different calendars updated with each other.
What you’ll need: 

  • Office 365 account
  • The email address to forward events to

The Completed App looks like:

Event Time-Boxer

Steps:

  1. From the templates page, select “Empty Logic App”
  2. In the search box type “event is created”, and scroll down to select the one from Outlook.com
  3. You’ll be prompted to connect an Outlook account to the Logic App.
  4. Select the preferred calendar to check for events and set the Interval to 1 dayWhen a New Event is Created
  5. Add a new step, and in the search box, type “Condition”, and scroll down to Control “Condition”Check if Event is New
  6. Under the condition, choose the Subject from the dynamic options. For the condition dropdown, select Contains, and in the right text field, type in “Timebox”Timebox
  7. Under the “false” condition, add a new step. Look for the “Create Event” action from Outlook.com. You might be told to authenticate again.
  8. Select a Calendar for the event and make the End time and Start time the same from the trigger event. The subject should be “Timebox:” followed by the subject. This way, we won’t be triggering the event again, because of our previous condition.
  9. Add the required attendee’s field and include the email address you want to forward events to.Create Event V2
  10. Save the logic app.

Honorable Mention: Traffic Sensor

So, there is one more use for Logic Apps that we didn’t cover. Microsoft has a cool tutorial for creating a Traffic Checker, which checks the traffic in the morning and sends an email based on the result. You can find it here.

There are a ton of different connectors for Logic Apps and, so there are more ideas out there than the ones listed.

Over the last couple of years, I’ve moved from serious SharePoint on-premise development to migrating web applications to Azure.  My exposure to Azure prior to the first application was standing up SharePoint development virtual machines that pretty much just used the base settings.  Trial by fire tends to be the way I learn things best which is good because not only was I having to learn about Azure resources, but I also had to refamiliarize myself with web development practices that I haven’t had to work with since .Net 2.0 or earlier.

One of my first tasks, and the focus of this post, was to migrate the SQL Server database into an Azure SQL (Platform as a Service) instance.  The databases generally migrated well.  Schemas and the majority of the stored procedures were compatible with Azure SQL.  Microsoft provides a program called Data Migration Assistant. This program not only analyzes but can migrate the database.  The analysis will return any suggested or required changes.  There are several types of issues that could arise including deprecated features, incompatible features, and syntax blockers.

Deprecated Field Types

There are three field types that will be removed from future versions: ntext, text, and image. This won’t hold up a migration, but I did change the fields to future-proof the database. The suggested solution is to use: nvarchar(max), varchar(max), or varbinary(max).

Azure SQL Database

T-SQL Concerns

I ran into several issues in scripts. Some had simple solutions while others required some redesign.

  • ISSUE: Scripts can’t connect to multiple databases using the USE statement.
  • SOLUTION: The scripts that did use the USE statement didn’t need it as it was only meant to connect to a single database.
  • ISSUE: Functions such as sp_send_dbmail are not supported. The scripts would send notification emails rather than sending from the application.
  • SOLUTION: This required an involved redesign. I created a table that logged the email and a scheduled Azure Web Job to send the email and flag the record as sent. The reference to sp_send_dbmail was replaced with an INSERT INTO Email command. If the email needed to be sent near-instantly, I could have used an Azure Queue and had the Web Job listen to the Queue then send the email when a new one arrives.
  • ISSUE: Cross-database queries are not supported. A couple of the apps had multiple databases and data from one was needed to add or update records in the other.
  • SOLUTION: This was another major design change. I had to move that logic into the application’s business layer.T-SQL Concerns

Linked Servers

Some of the applications had other databases, including other solutions like Oracle, that the application accessed. They use Linked Server connections to execute queries against the other data sources. So far, they’ve just been read-only connections.

Linked Servers are not available in Azure SQL. To keep the functionality, we had to modify the data layer of the application to pull the resources from the external database and either:
A. Pass the records into the SQL Stored Procedure, or
B. Move the logic of the Stored Procedure into the application’s data layer and make the changes in the application

One of the applications had a requirement that the external database was staying on-prem. This caused an extra layer of complexity because solutions to create a connection back to the database, like Azure ExpressRoute, were not available or approved for the client. Another team was tasked with implementing a solution to act as a gateway. This solution would be a web service that the Azure application would call to access this gateway.

SQL Agent Jobs

SQL Agent Jobs allow for out-of-process data manipulation. A couple of the applications used these to send notification emails at night or to synchronize data from another source. While there are several options in Azure for recreating this functionality such as Azure Functions and Logic Apps, we chose to use WebJobs. WebJobs can be triggered in several ways including a Timer. The jobs didn’t require intensive compute resources so it could share resources with the application in the same Azure App Service. This simplified the deployment story because it could be packaged and deployed together.
SQL Agent Jobs
Database modifications tend to be one of the major parts of the migration project. Some of the projects have been simple T-SQL changes while others have needed heavy architectural changes to reproduce functionality in a PaaS environment. Despite the difficulties, there will be major cost savings for some of the clients because they no longer need to maintain an expensive, possibly underutilized, server. Future posts in this series will cover Automation & Deployment, Session State, Caching, Transient Fault Handling, and general Azure lessons learned.

Data Lake has become a mainstay in data analytics architectures. By storing data in its native format, it allows organizations to defer the effort of structuring and organizing data upfront. This promotes data collection and serves as a rich platform for data analytics. Most data lakes are also backed by a distributed file system that enables massively parallel processing (MPP) and scales with even the largest of data sets. The increase of data privacy regulations and demands on governance requires a new strategy. Simple tasks such as finding, updating or deleting a record in a data lake can be difficult. It requires an understanding of the data and typically involves an inefficient process that includes re-writing the entire data set. This can lead to resource contention and interruptions in critical analytics workloads.

Apache Spark has become one of the most adopted data analytics platforms. Earlier this year, the largest contributor, Databricks, open-sourced a library called Delta Lake. Delta Lake solves the problem of resource contention and interruption by creating an optimized ACID-compliant storage repository that is fully compatible with the Spark API and sits on top of your existing data lake. Files are stored in Parquet format which makes them portable to other analytics workloads. Optimizations like partitioning, caching and data skipping are built-in so additional performance gains will be realized over native formats.

DeltaLake is not intended to replace a traditional domain modeled data warehouse. However, it is intended as an intermediate step to loosely structure and collect data. The schema can remain the same as the source system and personally identifiable data like email addresses, phone numbers, or customer IDs can easily be found and modified. Another important DeltaLake capability is Spark Structured Stream support for both ingest and data changes. This creates a unified ETL for both stream and batch while helping promote data quality.

Data Lake Lifecycle

  1. Ingest Data directly from the source or in a temporary storage location (Azure Blob Storage with Lifecycle Management)
  2. Use Spark Structured Streaming or scheduled jobs to load data into DeltaLake Table(s).
  3. Maintain data in DeltaLake table to keep data lake in compliance with data regulations.
  4. Perform analytics on files stored in data lake using DeltaLake tables in Spark or Parquet files after being put in a consistent state using the `VACCUUM` command.

Data Ingestion and Retention

The concept around data retention is to establish policies that ensure that data that cannot be retained should be automatically removed as part of the process.
By default, DeltaLake stores a change data capture history of all data modifications. There are two settings `delta.logRetentionDuration` (default interval 30 days) and `delta.deletedFileRetentionDuration` (default interval 1 week)

%sql
ALTER table_name SET TBLPROPERTIES ('delta.logRetentionDuration'='interval 240 hours', 'delta.deletedFileRetentionDuration'='interval 1 hours')
SHOW TBLPROPERTIES table_name

Load Data in DeltaLake

The key to DeltaLake is a SQL style `MERGE` statement that is optimized to modify only the affected files. This eliminates the need to reprocess and re-write the entire data set.

%sql
MERGE INTO customers
USING updates
ON customers.customerId = updates. customerId
WHEN MATCHED THEN
      UPDATE email_address = updates.email_address
WHEN NOT MATCHED THEN
      INSERT (customerId, email_address) VALUES (updates.customerId, updates.email_address)

Maintain Data in DeltaLake

Just as data can be updated or inserted, it can be deleted as well. For example, if a list of opted_out_consumers was maintained, data from related tables can be purged.

%sql
MERGE INTO customers
USING opted_out_customers
ON opted_out_customers.customerId = customers.customerId

WHEN MATCHED THEN DELETE

Summary

In summary, Databricks DeltaLake enables organizations to continue to store data in Data Lakes even if it’s subject to privacy and data regulations. With DeltaLakes performance optimizations and open parquet storage format, data can be easily modified and accessed using familiar code and tooling. For more information, Databricks DeltaLake and Python syntax references and examples see the documentation. https://docs.databricks.com/delta/index.html