top of page

Machine Learning services in AWS (part 1)

Machine Learning is everywhere. Even if not everywhere, the majority of people face it every day while surfing the internet, buying things online, watching videos, listening to music and many other activities. Machine learning is integrated in social media, ecommerce, healthcare, banking, manufacturing and other industries. It helps to enhance the customer service experience, personalize customer recommendations, automate data extraction and analysis, maximize the value of media, improve business metrics analysis, identify fraudulent online activities, etc. Possibilities are endless.

AWS is working hard in this direction and if you open a list of ML services, you will see the following:

Amazon SageMaker Amazon Augmented AI

Amazon CodeGuru Amazon DevOps Guru

Amazon Comprehend Amazon Forecast

Amazon Fraud Detector Amazon Kendra

Amazon Lex Amazon Personalize

Amazon Polly Amazon Rekognition

Amazon Textract Amazon Transcribe

Amazon Translate AWS DeepComposer

AWS DeepLens AWS DeepRacer

AWS Panorama Amazon Monitron

Amazon HealthLake Amazon Lookout for Vision

Amazon Lookout for Equipment Amazon Lookout for Metrics

There will be a series of posts about these services. In this post we will take a look at Amazon SageMaker and Amazon Rekognition.

Amazon SageMaker

Amazon Sagemaker is a king of ML services in AWS. More specifically it is not a single service, it’s rather a family of services and it should be discussed in a separate post. Here we will just look at Amazon Sagemaker in general.

Amazon SageMaker helps data scientists and developers to prepare, build, train, and deploy high-quality machine learning models quickly by bringing together a broad set of capabilities purpose-built for machine learning.

In general a workflow is following:

  1. Understand what a model has to predict and prepare data for machine learning (choose right features, aggregate, deal with missing values, etc.)

  2. Use a built-in algorithm, develop your own or try to find one in the Marketplace.

  3. Allocate needed resources and train a model. Compare different algorithms, tune hyperparameters and debug.

  4. Deploy a model, create an endpoint, make predictions and monitor a process.

The high-level view is following:

  1. First of all you need to understand the purpose of using machine learning in your case and choose an algorithm. There are 17 built-in algorithms in Sagemaker, applicable in different situations and circumstances. You can also check AWS Marketplace or develop your own algorithm.

  2. Then you need to prepare training and testing data sets (optionally validation dataset as well) and upload it to S3 bucket.

  3. After that you need to build your model. Choose the appropriate type of EC2 instance. An algorithm container will be pulled there and later used for training a model.

  4. Start a training job and wait till completion. A model will be uploaded to S3 bucket.

A map of use cases and suitable algorithms is below:

Example problems and use cases

Built-in algorithms

Predict if an item belongs to a category: an email spam filter

Factorization Machines Algorithm, K-Nearest Neighbors (k-NN) Algorithm, Linear Learner Algorithm, XGBoost Algorithm

Predict a numeric/continuous value: estimate the value of a house

Factorization Machines Algorithm, K-Nearest Neighbors (k-NN) Algorithm, Linear Learner Algorithm, XGBoost Algorithm

Based on historical data for a behavior, predict future behavior: predict sales on a new product based on previous sales data.

Improve the data embeddings of the high-dimensional objects: identify duplicate support tickets or find the correct routing based on similarity of text in the tickets

Drop those columns from a dataset that have a weak relation with the label/target variable: the color of a car when predicting its mileage.

Detect abnormal behavior in application: spot when an IoT sensor is sending abnormal readings

Protect your application from suspicious users: detect if an IP address accessing a service might be from a bad actor

Group similar objects/data together: find high-, medium-, and low-spending customers from their transaction histories

Organize a set of documents into topics (not known in advance): tag a document as belonging to a medical category based on the terms used in the document.

Latent Dirichlet Allocation (LDA) Algorithm, Neural Topic Model (NTM) Algorithm

Assign predefined categories to documents in a corpus: categorize books in a library into academic disciplines

Convert text from one language to other: Spanish to English

Summarize a long text corpus: an abstract for a research paper

Convert audio files to text: transcribe call center conversations for further analysis

​Label/tag an image based on the content of the image: alerts about adult content in an image

Detect people and objects in an image: police review a large photo gallery for a missing person

Tag every pixel of an image individually with a category: self-driving cars prepare to identify objects in their way

If a model efficiency is unsatisfactory, you can try to tune hyperparameters or maybe you have chosen a wrong algorithm and you should try another one. After that, train a model again. When your model is ready, you can deploy and use it.

Jupyter Notebook

One more interesting feature of Amazon SageMaker is Jupyter notebooks. It is a virtual environment with kind of IDE, where you can interactively create a model using TensorFlow, PyTorch or other frameworks, download, explore, and transform data, train a model, deploy and evaluate it.

SageMaker endpoint

When ML model is trained, we can deploy and use it. We can add more ML instances in different availability zones. They will be exposed by Model Endpoint and you can send API calls directly to SageMaker endpoint or via Lambda and API Gateway.