Azure AI fundamentals in a nutshell

In this blog post, I’ll summarize the important ideas about Azure AI fundamentals (topics from the AI-900 Microsoft certification exam). As usual, if you intend to take the exam, please note this article is meant for review/knowledge check purpose and doesn’t cover the whole official, up to date, learning material from Azure.

What I liked about the AI-900 exam is that it gives a wide overview of the “mainstream” AI capabilities today and makes you learn or review AI principles (like machine learning algorithms), and also includes a topic about ethics and responsibility in AI.

Topics of the AI-900 exam

Since April the 23rd of 2021 and until today, here are the skills measured in this exam (full, up-to-date list available on Microsoft Learn dedicated page):

  • Describe AI workloads and considerations (15-20%): recognize the AI capability that answers a specific problem, be aware of fairness, ethics, safety, privacy, inclusiveness… principles.
  • Describe fundamental principles of machine learning on Azure (30-35%): identify regression/classification/clustering scenarios, know how to create a ML solution, describe capabilities of no-code machine learning with Azure Machine Learning studio.
  • Describe features of computer vision workloads on Azure (15-20%): identify the different types of computer vision solutions, know the tools provided by Azure and their capabilities.
  • Describe features of Natural Language Processing (NLP) workloads on Azure (15-20%): identify features of NLP workloads scenarios and know the Azure tools.
  • Describe features of conversational AI workloads on Azure (15-20%): identify common use cases and know services provided by Azure.

Candidate to the exam should have foundational knowledge of machine learning and artificial intelligence concepts, and related Microsoft Azure services. No data science or software engineering experience is required, however some general programming knowledge would be beneficial.

AI basics

We can observe 5 major fields in AI:

FieldWhatUse cases
Machine Learning (ML)Predictive models based on data and statistics, the foundation of AI.A bike rental business trying to predict if rental goes up or down according to weather forecasts
Anomaly DetectionDetect unusual patterns or eventsDetect credit card fraud like two transactions in far-away countries with a small delay between them
Computer VisionInterpret visual input from images and videosFace recognition, car lane assist or distance detection, violence detection on security cameras
Natural Language Processing (NLP)Interpret written or spoken languageSpell check, spam filter, voice assistants
Conversational AIChat bots (overlaps with NLP)Chat assistance on websites or in applications

Responsible AI

AI faces several ethics challenges:

  • Biases can affect results (a model that searches for the best candidate for an IT job would always select men because it was trained on data that’s not diverse)
  • Errors may cause harm (autonomous vehicles failures)
  • Data could be exposed (sensitive data used for training is not securely stored)
  • Solutions may not work for everyone (no audio output for visually impaired users)
  • Users must trust a complex system (how can we trust an AI tool that makes investment recommendations without knowing the process behind?)
  • Who’s liable for AI-driven decisions? (where does the human responsibility stop when an AI-driven solution makes decisions)

An AI solution must take the following responsible principles into account:

  • Fairness (provide fair outcomes for everyone: it can’t withhold opportunities, resources or information from individuals, nor reinforce biases and stereotypes)
  • Reliability and Safety (the system must perform as intended and deal with new situations correctly, it must be rigorously tested and validated, and regularly re-tested and updated)
  • Privacy and Security (prevent data exposure, use tools to randomize or add noise to raw data so individual data cannot be identified, encrypt data)
  • Transparency (the system should be explainable, understandable and the goals clear)
  • Inclusiveness (the solution should consider all human experiences, it should also include people with disabilities in their usage/UX)

Microsoft published guidelines for Humain-AI interactions here.

AI solutions in Azure

Azure AI solutions allow to benefit from data storage, computer power (reduced compute time) and other Azure services integration. It also provides ready-to-use databases and services to allow developers to produce AI solutions with no or few coding.

  • Azure Machine Learning – platform for training, deploying and managing machine learning models
  • Cognitive Services – suite of services developers can use to build AI solutions
  • Azure Bot Service – cloud-based platform for developing and managing bots

Compute targets are cloud-based resources on which you can run processes:

Compute instanceworkstation to test the model
Compute clusterscalable clusters of VMs for on-demand processing of experimental code, that is for model training
Inference clusterdeployment targets for predictive services that use the trained models
Attached computelinks to existing Azure compute resources such as VMs
Datasetdata for model training and other operations

Machine Learning

If you’d like to learn or review Machine Learning, I warmly recommend the Elements of AI course by Reaktor (which is free).

Machine learning allows to create predictive models by finding relationships in data. There are several type of ML algorithms. They’re called supervised when the data set includes examples of “good answers” (example datasets are “labelled” by categories), while unsupervised algorithms use datasets without labels and have to discover patterns in the data themselves. Supervised solutions work with training and validation data while unsupervised solutions use unlabeled data and don’t have validation data.

  • Regression (supervised)

The model is trained with labeled data that includes both the features and known values for the label (x,y). Based on the data, the model learns to fit the feature combinations to the labels. The trained model can then predict labels for new items. If you represent the data graphically, you’d draw a line.

  • Classification (supervised)

This model works with “binary” labels (true/false, 1/0…). It allow to make probabilities and deal with uncertainty. Graphically, the data would draw a logistic function.

Features and Labels are important terms in Regression and Classification models, this answer on Stackoverflow explains it quite easily.

  • Clustering (unsupervised)

The data has no labels, the algorithm will try to discover clusters based on nearest elements (based on mean distance = k-means cluster).

Machine Learning on Azure

To create ML projects, you have to create an Azure Machine Learning Workspace in your Subscription. A workspace must have a unique name within the Resource Group (if Subscription and Resource Group aren’t clear, you should probably check Azure fundamentals). A Workspace will consume some data budget by itself! A project will hold computer power, data, experiments, models and services.

The Azure Machine Learning service is a cloud-based platform for creating, managing and publishing ML models. It provides the following features:

Automated machine learningQuickly create a model from data: based on the data and model type, it finds the best model
Azure Machine learning designerGraphical interface for no-code development of ML solutions (create ML pipelines)
Data and compute managementCloud-based data storage and compute resources to run experiment code at scale
PipelinesOrchestrate model training, deployment and management tasks: a pipeline to train and evaluate the model; an inference pipeline to predict labels from new data, deployed as a service for apps to use

Azure Machine Learning allows to automatically try multiple pre-processing techniques and model-training algorithms in parallel to find the best performing supervised ML model for the data (classification, regression, time series forecasting). After experimenting, we can see which model performed best according to the chosen evaluation metric (eg normalized root mean squared error). This model can then be deployed as a service for client applications to use.

In Azure Machine Learning, the service can be deployed as ACI Azure Container Instance (mostly for testing) or AKS Azure Kubernetes Service (recommended for production workloads) cluster. AKS requires to create an inference cluster compute target. The deployment can easily happen in the Machine Learning service, just selecting the best algorithm and clicking on Deploy. (Please note that learning material mentions Machine Learning Studio, but it will be retired by 31 August 2024, and Microsoft advises to use the Azure Machine Learning service which provides the same capabilities. Make sure to check the exam topics at the date you will take the exam.)

Supervised workflows can be designed with a no-code, graphical designer. Data transformation can be applied to adapt the dataset to the goal and address a specific issue, to normalize data or clean rows with null values eg.

We usually create a pipeline for data training with a subset of the data. When training is complete, the model is evaluated with the remaining data: predictions made by the model and actual labels are compared.

When we’re satisfied with the training pipeline, we can create the inference pipeline: we perform the same data transformation on the new data (data that’s not labelled yet) and use the trained model to infer (=predict) labels. This will be the basis for the predictive service.

Deploying a ML service means publishing the inference as a service for client applications. The deployment exposes a REST endpoint to consume the service, and clients authenticate with the primary key of the service.

Model evaluation metrics

Regression

  • Mean Absolute Error (MAE): The average difference between predicted values and true values. The lower the value, the better the predictions of the model. The value is based on the same units as the label.
  • Root Mean Squared Error (RMSE): The square root of the mean difference between predicted and true values. The value is based on the same units as the label. A large difference indicates a big variance in the individual errors (some very small and some very large).
  • Relative Squared Error (RSE): Metric between 0 and 1 based on the square of the differences between predicted and true values. The closer to 0, the better the model is performing. Because it’s relative, it can be used to compare models with different label units.
  • Relative Absolute Error (RAE): Metric between 0 and 1 based on the absolute differences between predicted and true values. The closer to 0, the better the model is performing. Because it’s relative, it can be used to compare models with different label units.
  • Coefficient of Determination (R² or R-Squared): Summarizes how much the variance between predicted and true values is explained by the model. The closer to 1, the better the model is performing.

Classification

The result of a classification evaluation is confusion matrix, a 2×2 grid with predicted and actual count for each category. 1×1 being true positives (in green) and 0x0 true negatives (in blue). Other cells represent differences between actual and predicted values (false positive/negative).

  • Accuracy: Ratio of correct predictions (true positives + true negatives) to the total of predictions. In other words, the proportion of right answers.
  • Precision: The fraction of positive cases correctly identified. In other words, the number of true positives divided by the number of true positives + false positives. A real life example would be, out of all the patients that the model predicted as having cancer, how many actually have cancer?
  • Recall (or True positive rate): The fraction of cases classified as positive that are actually positive. That is, the fraction of true positives divided by the number of true positives + false negatives. Example, out of all the patients who actually have cancer, how many did the model identify?
  • F1 Score: An overall metric combining precision and recall.

Although accuracy is the most intuitive metric, precision and recall are usually used to assess the model performance, as in some cases accuracy alone may be misleading.

Clustering

Because it’s unsupervised (we don’t know in advance a set of “right answers”), evaluating a clustering model is more difficult. A clustering model is defined as successful when it achieves a good level of separation between the items in each cluster. That’s what the metrics will measure.

  • Average Distance to Other Center: How close each point in the cluster is to the centers of all others clusters (on average).
  • Average Distance to Cluster Center: How close each point in the cluster is to the center of the cluster (on average).
  • Number of points: Number of points in the cluster.
  • Maximal Distance to Cluster Center: Maximum distance between each point and the center its cluster. If the number distance is high, the cluster may be widely dispersed.
  • Cluster’s spread: Combination of the Maximal Distance to Cluster Center and Average Distance to Cluster Center.

Cognitive Services

Cognitive Services is a resource that can hold several services of different kind, like Computer Vision, Text Analytics, Translator Text… Most of following services can be created in a Cognitive Services resource or in a dedicated resource (unless otherwise stated).

Using Cognitive Services means that only one endpoint and one primary key are created for all hosted services, which can simplify the management but gives less flexibility. It’s also a single item for billing.

Specific services can be used if there is no intention to add other services, or to track the utilization and billing of that particular service.

Computer vision

Computer vision allows to categorize or organize picture content, extract text from a picture or a pdf, identify elements on a picture, map movements…

An image is just a matrix of pixels. These numeric values can be used as features to train ML models.

Available resources to create a Computer Vision service:

  • Computer Vision
  • Cognitive Services

Computer Vision service capabilities:

  • Generate a human-readable description of an image. The service may return several propositions, each with a confidence score. This capability could be used to automatically generate image descriptions for visually impaired visitors of a website.
  • Tag elements in an image.
  • Detect object: tag and return the bounding box coordinates (coordinates of the top left, width, and height of the detected element)
  • Detect brands (logos) from a database
  • Detect faces, determine age (subset of the capabilities of Face Service), for basic face detection and analysis combined with general image analysis capabilities
  • Categorize en image (people, landscape…). Works with parent categories. Eg, people_ for a single person, people_group for a group of persons
  • Detect domain-specific content, like celebrities or landmarks
  • Optical Character Recognition (OCR) to detect printed or handwritten text in images
  • Detect image type (clipart, drawing, photo…)
  • Detect image color schemes (overall colors, dominant foreground and background)
  • Generate thumbnails
  • Moderate content (detect adult, violent, gory… content)

Read text with Computer Vision

Uses computer vision to “read” the text on images (detect it) and Natural Language Processing (NLP) to make sense of it.

OCR = Optical Character Recognition

Use cases:

  • Transform notes to text file
  • Digitize forms, medical records, historical documents…
  • Scan checks for bank deposit
  • Sort mail by addresses on the envelop

OCR API: Quick extraction of small amounts of text in images, immediate results (synchronous). Processing the image returns regions of the image that contain text, lines of text in each region, words in each line (all with bounding box coordinates).

Read API: Optimized for images with a lot of text, with noise… Uses the latest recognition models. Better option for scanned documents with a lot of text. Can automatically determine what’s the best model to use. Supports signatures recognition. Async process because it works with larger documents. 3-steps process: image is submitted and an operation ID is created; the operation ID allows to check for status of the analysis; when complete, results can be retrieved. Results contain the pages (including size and orientation information), lines, words (all with bounding box coordinates).

Form recognizer

Analyze forms, receipts… Key-Value based. Pre-built models available for training.

Available resources to create a Form Recognizer service:

  • Form Recognizer
  • Cognitive Services

Analyze receipts with Form Recognizer

Pre-built model for receipts can extract:

  • time of transaction
  • date of transaction
  • merchant information
  • taxes paid
  • receipt totals
  • other pertinent information that may be present on the receipt
  • all text on the receipt is recognized and returned as well

Guidelines for better results:

  • Formats: JPEG, PNG, BMP, PDF or TIFF
  • File size less than 50 MB
  • Image size between 50 x 50 and 10 000 x 10 000 pixels
  • PDF documents no larger than 17 x 17 inches

With free tier, only first two pages will be processed when passing in PDF or TIFF documents.

Custom vision

Image classification: define a class/category based on a set of features to predict a label.

What’s the difference between Custom Vision and Computer Vision? They both deal with computer vision on images. The difference between them is the Custom Vision can only do image classification and object detection, as well as take in your own images. The Computer Vision APIs can do a bit more, but you don’t have any control over how the models are trained.

Microsoft Docs

Object detection: classify and return the coordinates of the item (the bounding box).

It uses deep learning and CNNs (convolutional neural networks).

Use cases:

  • Product identification (online or in-store)
  • Medical diagnosis (evaluate X-Ray or MRI images)
  • Checking building safety (search for extinguishers, exits…)
  • Driving assistance (lane assist)

Available resources to create a Computer Vision service:

  • Custom Vision
  • Cognitive services

The services can be used for training, prediction, or both. Within Custom Vision, if “both” is chosen, 2 resources are created (one for prediction and one for training), while a Cognitive Services resource can hold both (because it can hold multiple services of different kind).

To train the model, we can use the Custom Vision portal, or programmation with a SDK. To train the model, we have to supply images and label them. For detection, the bounding box also has to be supplied. To ease the process, there’s a graphical interface in Custom Vision portal: it automatically suggests the areas of the items on a picture, then we can just add a label to them, and adjust the box if necessary. After tagging and training with an initial set, the wizard has the ability to suggest classes and bounding boxes for new images in the dataset (smart tagging).

Evaluating the model is an iterative process: the service repeatedly trains the model using only a part of the data, holding some back to evaluate the model. Performance of the model is indicated by those metrics :

  • Precision: % of correct class predictions (ex: 0.7 means 70% of predictions were correct). For example, the model predicted that from all images, 10 were showing a cat, but in fact there were only 7.
  • Recall: % of correctly identified images. For example, from all images, 10 were showing cats, but the model only found 7.
  • AP (Average Precision): a metric that uses both precision and recall.

Face Service

Provides detection and analysis capabilities (return additional information such as facial landmarks => can be used to train a model to infer age, emotional state, sex… or identify known individuals).

Use cases:

  • Secure access to facilities or to unlock devices
  • Social media: face recognition to tag friends on pictures
  • Custom advertising, based on supposed age, sex…
  • Missing persons and search with public cameras (facial recognition)

Several services use Face capabilities:

  • Computer Vision (face detection and a set of basic face analysis, such as determining the age of a person)
  • Video Indexer (face face detection in videos)
  • Face, which offers pre-built algorithms to detect, recognize and analyze faces; it has the widest range of facial analysis capabilities

Available resources to create a Face service:

  • Cognitive Services
  • Face

Next to coordinates (bounding box), Face can return a set of attributes such as:

  • Age: a guess at an age
  • Blur: how blurred the face is (which can be an indication of how likely the face is to be the main focus of the image)
  • Emotion: what emotion is displayed
  • Exposure: aspects such as underexposed or over exposed
  • Facial hair: the estimated facial hair presence
  • Glasses: if the person is wearing glasses
  • Hair: the hair type and hair color
  • Head pose: the face’s orientation in a 3D space
  • Makeup: whether the face in the image has makeup
  • Noise: refers to visual noise in the image (wheter the image looks grainy or full of tiny dots)
  • Occlusion: determines if there may be objects blocking the face in the image
  • Smile: whether the person in the image is smiling

Improve accuracy of the image detection:

  • supported formats are JPEG, PNG, GIF and BMP
  • file size 6 MB or smaller
  • face size has to range from 36 x 36 up to 4096 x 4096 pixels
  • best results are obtained with a face in full-frontal position

Natural Language Processing (NLP)

Analyze text

Gain insight of the content of a text.

Available resources to create a text analysis service:

  • Language
  • Cognitive Services

Techniques:

  • Statistical analysis: remove “stop words” (the, a, of…) and perform frequency analysis of the remaining words to get a clue about the main subject of the text. The analysis can take multi-term phrases (N-grams).
  • Stemming or lemmatization algorithms are applied to normalize words before counting them, so that alike words are interpreted as one term (example: power, powerful, powered).
  • Linguistic structure rules are applied to analyze sentences (nouns, verbs, adjectives…).
  • Encoding words or terms as numeric features so they can be used to train a model (often used for sentiment analysis).
  • Creating vectorized models that capture semantic relationships between words by assigning them to locations in n-dimensional space.

There are pre-existing models in Azure that can:

  • Determine the language of the text
  • Perform sentiment analysis (positive vs negative sentiment)
  • Extract key phrases from a text to determine the main talking points
  • Identify and categorize entities (people, places, organizations, dates, times, quantities…) in the text

Language analysis returns the language name, ISO 6391 code and confidence level (up to 10). If several languages present in the text, it will return the predominant language based on length of phrases, proportion of text in that language… Only the predominant language is returned. If no language can be detected, it returns “unknown” for the name and identifier and NaN as score.

Sentiment analysis can return sentiment scores and labels per sentence. The score goes from 0 (negative sentiment) to 1 (positive sentiment), 0.5 being neutral or indeterminate (could be caused by a wrong language parameter, bad sentence construction or no sentence at all…).

Entity recognition returns a list of entities (cities, brands, numbers…) defined by their type ([subtype]) and value (key-value pair). It can provide a link with more information about the entity from Wikipedia.

Recognize speech

Accept vocal commands (speech recognition) and provide responses (speech synthesis).

Speech recognition = speech-to-text. The service already contains several models including an acoustic model that converts the audio signal into phonemes (representation of specific sounds) and a language model that maps phonemes to words.

Speech synthesis = text-to-speech. Input = text to be spoken and voice to be used. Text is tokenized to break it into individual words and assign phonetic sounds to each one. Phonetic transcription is broken into prosodic units (phrases, clauses or sentences) to create phonemes that will be converted into audio.

Available resources to create a speech service:

  • Speech
  • Cognitive Services

APIs:

  • Speech-to-Text API: Based on the Universal Language Model owned and trained by Microsoft. Optimized for conversation and dictation. Possible to create and train a custom model. Allows real-time transcription (microphone, streaming audio file) or batch transcription (Audio recordings stored on a shared folder, a server, Azure storage… Can us SAS. Async transcription results).
  • Text-to-Speech API: Can be played directly or saved to audio files. Uses Neural Voices (Neural Networks) for a more natural voice (human intonation).

Translate

There are text translation, speech translation (live, speech-to-speech and intermediary text format translation (speech-to-text) capabilities.

Available resources to create a translation service:

  • Translator text
  • Speech
  • Cognitive services

Translator text service

  • Uses a Neural Machine Translation (NMT) model which analyzes the semantic context to render a more accurate translation.
  • Integrates in applications.
  • From (1) and To (1-N) languages must be specified (ISO code).
  • Possibility to specify cultural variants (culture code).
  • Optional configurations: profanity filtering (different levels), selective translation (tag content that’s not to be translated, like phrases, brands, codes…).

Speech translation

  • Speech-to-text API
  • Text-to-speech API
  • Speech translation API (for real-time): streaming input source (like a microphone or audio file), return translation as text or audio stream; it can be used for real-time captioning, simultaneous translation of a spoken conversation…
  • Source language must be specified with the culture code format, the target language with the language code.

Conversational Language Understanding

Not only interpret but also “understand” a text. Used among others in smart homes/to command smart devices.

Three important concepts:

  • Utterances: something a user might say that the app must interpret. Ex: “Turn off the fan”.
  • Entities: an item to which an utterance refers. Ex: the fan.
  • Intents: purpose expressed in the utterance. Ex: turn off. “None” = no mapping, considered a fallback, usually to provide a generic response to users when their request doesn’t match any other intent.

A Language Understanding application uses a model consisting of intents and entities. Utterances are used to train the model.

Available resources to create a Language Understanding service:

  • Language Service
  • Cognitive Services

CLU provides a collection of prebuilt domains with pre-defined intents and entities for common scenarios, that can be a starting point for the model.

Sample utterances are mapped to entities, or entities are mapped to words in utterances.

It’s recommended to use the web based portal for authoring (defining the model) and the SDK for runtime predictions.

Authoring:

  • Create intents based on actions a user would want to perform
  • For each intent, include a variety of utterances of how the user could express the intent
  • Create entities :
    – machine-learned by the model during training from the context in the sample utterances
    – from a list, defined as a hierarchy of lists and sublists, with synonyms
    – RegEx (ex: phone numbers, email addresses)
    – Pattern.any, for complex entities that may be hard to extract from sample utterances

Training the model:

  • After defining the intents and entities in the model, and included a suitable set of sample utterances
  • Use sample utterances to teach the model to match natural language expressions to probable intents and entities
  • After training, evaluate with test utterances and check predicted intents
  • Iterative process of training and testing until the result is satisfying

Predicting

  • When satisfied with the results, publish the app to a prediction resource for consumption

Conversational AI

Combination of Language Service and Azure Bot service.

Needs two things:

  • a knowledge base of question-answer pairs, usually with some built-in natural language processing model to enable questions that can be phrased in multiple ways
  • a bot service that provides an interface to the knowledge base through 1-N channels

To create a custom question answering knowledge base, we can use the Language Studio custom question answering feature (creation, training, publish and management of the knowledge bases). It can also be done in code with the REST API or the SDK but it requires specific knowledge.

How to create it with Language Studio:

  • Provision a language service resource
  • Define questions and answers, generated from an existing FAQ or entered manually, or even a mix of both methods (most common)
  • Questions can be assigned alternative phrasing
  • Saving the set analyzes and applies built-in natural language processing model to match appropriate answers to questions
  • To test the knowledge base, use the test interface to submit new questions
  • When satisfied with the results, it can be deployed with endpoint, ID and AuthZ key

To build a bot in the Azure bot service, we can use the Microsoft Bot Framework SDK (C# or Node.js), or automatic bot creation functionality that creates a bot for the deployed knowledge base and publishes it as an Azure Bot service application in a few clicks.

To extend the bot functionality, the code has to be customized (in portal, or by downloading a local copy then republishing it).


Decision support

Anomaly detection determines whether values in a series are within expected parameters. Some use cases are:

  • Environmental monitoring (HVAC)
  • Blood pressing monitoring
  • Evaluating mean time between HW failures
  • Predictive maintenance
  • Credit card fraud
  • Month-over-month expenses comparison

Azure Anomaly Detector is part of the Decision Services, part of Azure Cognitive Services.

There are two detection models:

  • Detection of anomalies in historical time series = batch processing
  • Real-time data (from IoT devices, sensors, streaming input sources) = last-point anomaly

Knowledge mining

I found this topic more difficult to understand (or the documentation too heavy, idk), so I tried to explain it the best I could (= the best I understood) but tbh most of the notes I took come directly from the Microsoft Learn course because, as it wasn’t 100% clear to me, it was hard to rephrase it myself.

Allows to search and learn from a vast amount of information.

Azure Cognitive Search = private enterprise solution to build a single fast index for content (like a private, customized search index). It can be used internally or on a public facing internet asset (app or website).

Capabilities:

  • Import data from a variety of sources (JSON format!), auto crawling support from selected sources in Azure
  • AI powered indexing can infer and extract searchable data from raw text and non-text sources
  • Import data wizard to automate processes in the portal to create various objects needed for the search engine
  • You decide what data is imported into the index and set up indexers to pull that data into it, or push JSON formatted documents manually
  • Query search indexes
  • PaaS
  • Programmable search engine (built on Apache Lucene): supports both simple query and full Lucene query syntax
  • Highly available (99.9% uptime SLA) for cloud and on-prem
  • Supports 56 languages for intelligent handling (phonetic match, language specific linguistics…)
  • Supports geo-search filtering (proximity)
  • User-experience: auto-suggest, auto-complete, pagination, hit highlighting…
  • Programmable via REST API or SDK, customizable via Portal.

To build a Cognitive Search service, you need to upload your content to an Azure data source. Supported data storage sources include:

  • Azure SQL database
  • SQL server on Azure VM
  • Cosmos DB
  • Azure Blob Storage
  • Azure Table Storage

It must be in JSON format.

Some key terms:

  • Data Source = persisted connection information to source data
  • Index = (put in different ways:) a physical data structure used for text search and queries; a persisted collection of JSON documents and other content used to enable search functionality; container of searchable documents
  • Document = a document within an index is more or less like a row in a table, each document is a single unit of searchable data in the index
  • Schema = structure of a document
  • Indexer = a configuration object specifying a data source, target index, an optional skillset, optional schedule, optional configuration settings for error handling and base-64 encoding
  • Skillset (optional) = a set of instructions for manipulating, transforming and shaping content, including extracting information from image files, NLP, translation…; refers to a Cognitive Services resource that provides enrichment thanks to AI
  • Knowledge store (optional) = stores output from an AI enrichment pipeline in tables and blobs (Azure storage) for independent analysis and downstream processing

To make an analogy with a relational database, an index would be a table, each document would be a row, fields of the document are like columns with datatypes and attributes. The equivalent of the primary key is a key and is a boolean property.

Behavior attributes are:

  • Retrievable = it can be returned in the search results
  • Filterable = it can be used to filter expressions
  • Sortable = it can be sorted in order in queries
  • Facetable = it can be used to group results to enable faceted navigation of the results
  • Searchable = it can be searched against (only applicable to text fields)
  • Analyzer to use = choice of the language analyzer for the field that processes text in a query (only applicable to text fields)

For efficiency, only the required behaviors should be added to the field. Also important to note, if you forget a behavior and have to add it, you must rebuild the index. This is true for any change that has to be made to fields definition. On the other hand, just adding new fields is supported, it will set the fields to null in all existing documents. For schema changes, the code-based approach would be faster as it allows to iterate over the schemas. The portal requires to manually fill schema details.

When changes have to be made to a schema, it’s recommended to create a new one with the new structure and information, and replace the previous one when the new one is ready for use. This way, the user is not impacted.

Here’s a small part example of what a schema would look like (I’m not sure the example is really relevant but I thought it was interesting to include an example of a Collection type just to be aware it exists).

{ "name": "people",
  "fields": [
     { "name": "lastName", "type": Edm.String, "filterable": true},
     { "name": "firstNames", "type": "Collection(Edm.String)", "searchable": true, "filterable": false }
...

The indexer serves to build an index, that is create search documents and use them to populate the index. It can be done with application code or Azure indexer.

To create and load JSON documents into an index, there are two approaches:

  • push method via REST API or .NET SDK : more flexibility, no restriction on data source type, frequency of execution… but requires some technical skills
  • pull method from some Azure data sources (supported data sources were enumerated a few paragraphs above), data exported in JSON if it isn’t already in that format

Pull method => the indexer is a crawler that extracts searchable text and metadata from an external Azure data source and populates a search index using field-to-field mappings between source data and index. Indexers only import new or updated documents.

Leave a Reply

Your email address will not be published. Required fields are marked *

Skip to content