This blog is a follow on about Azure Cognitive Services, Microsoft’s offering for enabling artificial intelligence (AI) applications in daily life. The offering is a collection of AI services with capabilities around speech, vision, search, language, and decision.

Azure Personalizer is one of the services in the suit of Azure Cognitive Services, a cloud-based API service that allows you to choose the best experience to show to your users by learning from their real-time behavior. Azure Personalizer is based on cutting-edge technology and research in the areas of Reinforcement Learning, it uses a machine learning model that is different from traditional supervised and unsupervised learning models.

In Azure Cognitive Services Personalizer: Part One, we discussed the core concepts and architecture of Azure Personalizer Service, Feature Engineering, its relevance, and its importance.

In Part Two, we cover a couple of use cases in which Azure Personalizer Service is implemented. We looked at features used, reward calculation, and their test run result.

In this blog, Part Three, we list out recommendations and capacities for implementing solutions using Azure Personalizer Service.

Recommendations, Current Capacities, and Limits

This section describes some essential recommendations while implementing Personalizer, and current capacity factors for its use.

Recommendations

  • Personalizer starts with default learning policy which can yield moderate performance. As part of optimization, Evaluations are run that allows Personalizer to create new Learning Policies specifically optimized to a given use case. Optimized learning policies perform significantly better for each specific loop, generated during evaluation.
  • The reward score calculation should consider only relevant factors with appropriate weight. Experiment duration (Rank to Reward cycle) should be low enough that the reward score can be computed while it’s still relevant. How well-ranked results worked can be computed by business logic, by measuring a related aspect of the user behavior, and it is expressed in value between -1 and 1.
  • The context for the ranking items (actions) can be expressed as a dictionary of at least 5 features that you think would help make the right choice, and that doesn’t include personally identifiable information. Similarly, each item (action) should be expressed as a dictionary of at least 5 attributes or features that you think will help Personalizer make the right choice. There should be less than 50 actions (items) to rank per call.
  • Personalizer will adapt to continuous change in the real world, but results won’t be optimal if there are not enough events and data to learn from to discover and settle on new patterns. Data can be retained long enough to accumulate a history of at least 100,000 interactions.
  • You should choose a use case that happens often enough. Consider looking for use cases that happen at least 500 times per day. Context and actions have enough features defined to facilitate learning.
  • Your data retention settings allow Personalizer to collect enough data to perform offline evaluations and policy optimization. This is typically at least 50,000 data points.
  • Don’t use Personalizer where the personalized behavior isn’t something that can be discovered across all users but rather something that should be remembered for specific users or comes from a user-specific list of alternatives.
  • To prevent actions from being ranked, it can either be removed from the list while making Rank API call or use Inactive Events. To disable automatic learning, call Rank API with learningEnabled = False. Learning for an inactive event is implicitly activated if you send a reward for the Rank results.
  • Personalizer exploration setting of zero will negate many of the benefits of Personalizer. With this setting, Personalizer uses no user interactions to discover better user behavior. This leads to model stagnation, drift, and ultimately lower performance.
  • A setting that is too high will negate the benefits of learning from user behavior. Setting it to 100% implies constant randomization, and any learned behavior from users would not influence the outcome.
  • To realize the full potential of AI offerings, design and implementation should gain the full trust of end-users, aspects to consider include ethics, privacy, security, safety, inclusion, transparency, and accountability.

Capacity & Limits

  • How well the ranked-choice worked need to be measured with relevant user behavior and scored between -1 and 1 with single or multiple calls to Reward API.
  • Context and Actions (Items) have enough features (at least 5 features each) to facilitate learning. Fewer than 50 items (actions) to rank per single Rank call.
  • Retaining the data for long enough to accumulate a history of at least 100,000 interactions to perform effective offline evaluations and policy optimizations, typically at least 50,000 data points.
  • Personalizer supports features of data type string, numeric, and Boolean. Empty context is not supported, it should have at least one feature in the context.
  • For categorical features, pre-defining of the possible values or ranges is not required.
  • Features that are not available at the time of Rank call should be omitted instead of sent with a null value.
  • There can be hundreds of features defined for a use case, but they must be evaluated (using principles of Feature Engineering and Personalizer Evaluation option) for effectiveness, and less effective ones should be removed.
  • The features in the actions may or may not have a correlation with the features in the context used in Personalizer.
  • If the ‘Reward Wait Time’ expires, and there has been no reward information, a default reward is applied to that event for training. The maximum wait duration supported currently is 6 days.
  • Personalizer Service can return a rank very rapidly, and azure will auto-scale on need basis to maintain the rapid generation of ranking results. Throughput is calculated by adding the size of action and context JSON documents, and factor the rate of 20 MB / sec.
  • Context and Actions (items) are expressed as a JSON object that is sent with the Rank API call. JSON objects can include nested JSON objects and simple property/values. Arrays can be included if the items are numbers.

Conclusion

Azure Cognitive Services suite facilitates a broad range of AI implementations. It enables applying the benefits of AI technology in little things we do in our daily lives. Personalizer service is simple to use yet powerful AI service that can be applied in any scenario where the ranking of options is meaningful once it is expressed with a rich set of features. I hope this blogpost is helpful in explaining the high potential use of Azure Cognitive Services Personalizer Service. I also wanted to thank my colleague Kesav Chenna at AIS for his contribution in implementing Personalizer in the use cases discussed in this blog.

References

This blog is a follow on about Azure Cognitive Services, Microsoft’s offering for enabling artificial intelligence (AI) applications in daily life. The offering is a collection of AI services with capabilities around speech, vision, search, language, and decision.

In Azure Cognitive Services Personalizer: Part One, we discussed the core concepts and architecture of Azure Personalizer Service, Feature Engineering, its relevance, and its importance.

In this blog, Part Two, we will go over a couple of use cases in which Azure Personalizer Service is implemented. We will look at features used, reward calculation, and their test run result. Stay tuned for Part Three, where we will list out recommendations and capacities for implementing solutions using Azure Personalizer Service.

Use Cases and Results

Two use cases implemented using Personalizer involves the ranking of content for each user of a business application.

Use Case 1: Dropdown Options

Different users of an application with manager privileges would see a list of reports that they can run. Before Personalizer was implemented, the list of dozens of reports was displayed in alphabetical order, requiring most of the managers to scroll through the lengthy list to find the report they needed. This created a poor user experience for daily users of the reporting system, making for a good use case for Personalizer. The tooling learned from the user behavior and began to rank frequently run reports on the top of the dropdown list. Frequently run reports would be different for different users, and would change over time for each manager as they get assigned to different projects. This is exactly the situation where Personalizer’s reward score-based learning models come into play.

Context Features

In our use case of dropdown options, the context features JSON is as below with sample data

{
    "contextFeatures": [
        { 
            "user": {
                "id":"user-2"
            }
        },
        {
            "scenario": {
                "type": "Report",
                "name": "SummaryReport",
                "day": "weekend",
                "timezone": "est"
            }
        },
        {
            "device": {
                "mobile":false,
                "Windows":true,
                "screensize": [1680,1050]
            }
        }
    ]
}

Actions (Items) Features

Actions were defined as the following JSON object (with sample data) for this use case

{
    "actions": [
    {
        "id": "Project-1",
        "features": [
          {
              "clientName": "Client-1",
              "projectManagerName": "Manager-2"
          },
          {

                "userLastLoggedDaysAgo": 5
          },
          {
              "billable": true,
              "common": false
          }
        ]
    },
    {
         "id": "Project-2",
         "features": [
          {
              "clientName": "Client-2",
              "projectManagerName": "Manager-1"
          },
          {

              "userLastLoggedDaysAgo": 3
           },
           {
              "billable": true,
              "common": true
           }
        ]
    }
  ]
}

Reward Score Calculation

Reward score was calculated based on the actual report selected (from the dropdown list) by the user from the ranked list of reports displayed with the following calculation:

  • If the user selected the 1st report from the ranked list, then reward score of 1
  • If the user selected the 2nd report from the ranked list, then reward score of 0.5
  • If the user selected the 3rd report from the ranked list, then reward score of 0
  • If the user selected the 4th report from the ranked list, then reward score of – 0.5
  • If the user selected the 5th report or above from the ranked list, then reward score of -1

Results

View of the alphabetically ordered report names in the dropdown before personalization:

alphabetically ordered report names in the dropdown before personalization

View of the Personalizer ranked report names in the dropdown for the given user:

Azure Personalizer ranked report names based on frequency

Use Case 2: Projects in Timesheet

Every employee in the company logs a daily timesheet listing all of the projects the user is assigned to. It also lists other projects, such as overhead. Depending upon the employee project allocations, his or her timesheet table could have few to a couple of dozen active projects listed. Even though the employee is assigned to several projects, particularly at lead and manager levels, they don’t log time in more than 2 to 3 projects for a few weeks to months.

Before personalization, the projects in the timesheet table were listed in alphabetical order, again resulting in a poor user experience. Even more troublesome, frequent user errors caused the accidental logging of time in the incorrect row. Personalizer was a good fit for this use case as well, allowing the system to rank projects in the timesheet table based on time logging patterns for each user.

Context Features

For the Timesheet use case, context features JSON object is defined as below (with sample data):

{
    "contextFeatures": [
        { 
            "user": {
                "loginid":"user-1",
                "managerid":"manager-1"
		  
            }
        },
        {
            "scenario": {
                "type": "Timesheet",
                "day": "weekday",
                "timezone": "ist"
            }
        },
        {
            "device": {
                "mobile":true,
                "Windows":true,
                "screensize": [1680,1050]
            }
        }
     ]
}

Actions (Items) Features

For the timesheet use case, the Actions JSON object structure (with sample data) is as under:

{
    "actions": [
    {
        "id": "Project-1",
        "features": [
          {
              "clientName": "Client-1",
              "userAssignedForWeeks": "4-8"
          },
          {

              "TimeLoggedOnProjectDaysAgo": 3
          },
          {
              "billable": true,
              "common": false
          }
        ]
    },
    {
         "id": "Project-2",
         "features": [
          {
              "clientName": "Client-2",
              "userAssignedForWeeks": "8-16"
          },
          {

              " TimeLoggedOnProjectDaysAgo": 2
           },
           {
              "billable": true,
              "common": true
           }
        ]
    }
  ]
}

Reward Score Calculation

The reward score for this use case was calculated based on the proximity between the ranking of projects in timesheet returned by the Personalizer and the actual projects that the user would log time as follows:

  • Time logged in the 1st row of the ranked timesheet table, then reward score of 1
  • Time logged in the 2nd row of the ranked timesheet table, then reward score of 0.6
  • Time logged in the 3rd row of the ranked timesheet table, then reward score of 0.4
  • Time logged in the 4th row of the ranked timesheet table, then reward score of 0.2
  • Time logged in the 5th row of the ranked timesheet table, then reward score of 0
  • Time logged in the 6th row of the ranked timesheet table, then reward score of -0.5
  • Time logged in the 7th row or above of the ranked timesheet table, then reward score of -1

The above approach to reward score calculation considers that most of the time users would not need to fill out their timesheet for more than 5 projects at a given time. Hence, when a user logs time against multiple projects, the score can be added up and then capped between 1 to -1 while calling Personalizer Rewards API.

Results

View of the timesheet table having project names alphabetically ordered before personalization:

project names alphabetically ordered before Azure personalization

View of the timesheet table where project names are ordered based on ranking returned by Personalization Service:

timesheet table ordered by Azure Personalization Service

Testing

In order to verify the results of implementing the Personalizer in our selected use cases, unit tests were effective. This method was helpful in two important aspects:

  1. Injecting the large number of user interactions (learning loops)
  2. In simulating the user behavior towards a specific pattern

This provided an easy way to verify how Personalizer reflects the current and changing trends injected via Unit Tests in the user behavior by using reward scores and exploration capability. This also enabled us to test different configuration settings provided by Personalizer Service.

Test Run 1

This 1st test run simulated different user choices with different explorations settings. The test results show the number of learning loops that started reflecting the user preference from intermittent to a consistent point.

Unit Test Scenario
Learning Loops, Results and Exploration Setting
User selection of Project-A Personalizer Service started ranking Project-A at the top intermittently after 10 – 20 learning loops and ranked it consistently at the top after 100 learning loops with exploration set to 0%
User selection of Project-B Personalizer Service started reflecting the change in user preference (from Project-A to Project-B) by ranking Project-B at the top intermittently after 100 learning loops and ranked it consistently at the top after 1200 learning loops with exploration set to 0%
User selection of Project-C

 

Personalizer Service started reflecting the change in user preference (from Project-B to Project-C) by ranking Project-C at the top intermittently after 10 – 20 learning loops and ranked it almost consistently at the top after 150 learning loops with exploration set to 50%

 

Personalizer adjusted with the new user preference quicker when exploration was utilized.

 

User selection of Project-D

 

Personalizer Service started reflecting the change in user preference (from Project-C to Project-D) by ranking Project-D at the top intermittently after 10 – 20 learning loops and ranked it almost consistently at the top after 120 learning loops with exploration set to 50%

 

Test Run 2

In this 2nd test run, the impact of having and removing sparse features (little effective features) is observed.

Unit Test Scenario
Learning Loops, Results and Exploration Setting
User selection of Project-E Personalizer Service started reflecting the change in user preference (from Project-D to Project-E) by ranking Project-E at the top intermittently after 10 – 20 learning loops and ranked it almost consistently at the top after 150 learning loops with exploration set to 20%
User selection of Project-F Personalizer Service started reflecting the change in user preference (from Project-E to Project-F) by ranking Project-F at the top intermittently after 10 – 20 learning loops and ranked it almost consistently at the top after 250 learning loops with exploration set to 20%
User selection of Project-G Two less effective features (sparse features) of type datetime were removed. Personalizer Service started reflecting the change in user preference (from Project-F to Project-G) by ranking Project-G at the top intermittently after 5 – 10 learning loops and ranked it almost consistently at the top after only 20 learning loops with exploration set to 20%

 

User selection of Project-H

 

Two datetime sparse features were added back. Personalizer Service started reflecting the change in user preference (from Project-G to Project-H) by ranking Project-H at the top intermittently after 10 – 20 learning loops and ranked it almost consistently at the top after 500 learning loops with exploration set to 20%

 

Thanks for reading! In the next part of this blog post, we will look at the best practices and recommendations for implementing Personalizer solutions. We will also touch upon the capacities and limits of the Personalizer service at present.