Improving Content Moderation with Amazon Rekognition Bulk Analysis and Custom Moderation

Amazon Rekognition makes it easy to add image and video analysis to your applications. It’s based on the same proven, highly scalable, deep learning technology developed by Amazon’s computer vision scientists to analyze billions of images and videos daily. It requires no machine learning (ML) expertise to use and we’re continually adding new computer vision features to the service. Amazon Rekognition includes a simple, easy-to-use API that can quickly analyze any image or video file that’s stored in Amazon Simple Storage Service (Amazon S3).

Customers across industries such as advertising and marketing technology, gaming, media, and retail & e-commerce rely on images uploaded by their end-users (user-generated content or UGC) as a critical component to drive engagement on their platform. They use Amazon Rekognition content moderation to detect inappropriate, unwanted, and offensive content in order to protect their brand reputation and foster safe user communities.

In this post, we will discuss the following:

Content Moderation model version 7.0 and capabilities
How does Amazon Rekognition Bulk Analysis work for Content Moderation
How to improve Content Moderation prediction with Bulk Analysis and Custom Moderation

Content Moderation Model Version 7.0 and Capabilities

Amazon Rekognition Content Moderation version 7.0 adds 26 new moderation labels and expands the moderation label taxonomy from a two-tier to a three-tier label category. These new labels and the expanded taxonomy enable customers to detect fine-grained concepts on the content they want to moderate. Additionally, the updated model introduces a new capability to identify two new content types, animated and illustrated content. This allows customers to create granular rules for including or excluding such content types from their moderation workflow. With these new updates, customers can moderate content in accordance with their content policy with higher accuracy.

Let’s look at a moderation label detection example for the following image.

The following table shows the moderation labels, content type, and confidence scores returned in the API response.

Moderation Labels	Taxonomy Level	Confidence Scores
Violence	L1	92.6%
Graphic Violence	L2	92.6%
Explosions and Blasts	L3	92.6%

Content Types	Confidence Scores
Illustrated	93.9%

To obtain the full taxonomy for Content Moderation version 7.0, visit our developer guide.

Bulk Analysis for Content Moderation

Amazon Rekognition Content Moderation also provides batch image moderation in addition to real-time moderation using Amazon Rekognition Bulk Analysis. It enables you to analyze large image collections asynchronously to detect inappropriate content and gain insights into the moderation categories assigned to the images. It also eliminates the need for building a batch image moderation solution for customers.

You can access the bulk analysis feature either via the Amazon Rekognition console or by calling the APIs directly using the AWS CLI and the AWS SDKs. On the Amazon Rekognition console, you can upload the images you want to analyze and get results with a few clicks. Once the bulk analysis job completes, you can identify and view the moderation label predictions, such as Explicit, Non-Explicit Nudity of Intimate parts and Kissing, Violence, Drugs & Tobacco, and more. You also receive a confidence score for each label category.

Create a bulk analysis job on the Amazon Rekognition console

Complete the following steps to try Amazon Rekognition Bulk Analysis:

On the Amazon Rekognition console, choose Bulk Analysis in the navigation pane.
Choose Start Bulk Analysis.
Enter a job name and specify the images to analyze, either by entering an S3 bucket location or by uploading images from your computer.
Optionally, you can select an adapter to analyze images using the custom adapter that you have trained using Custom Moderation.
Choose Start analysis to run the job.

When the process is complete, you can see the results on the Amazon Rekognition console. Also, a JSON copy of the analysis results will be stored in the Amazon S3 output location.

Amazon Rekognition Bulk Analysis API request

In this section, we guide you through creating a bulk analysis job for image moderation using programming interfaces. If your image files aren’t already in an S3 bucket, upload them to ensure access by Amazon Rekognition. Similar to creating a bulk analysis job on the Amazon Rekognition console, when invoking the StartMediaAnalysisJob API, you need to provide the following parameters:

OperationsConfig – These are the configuration options for the media analysis job to be created:
- MinConfidence – The minimum confidence level with the valid range of 0–100 for the moderation labels to return. Amazon Rekognition doesn’t return any labels with a confidence level lower than this specified value.
Input – This includes the following:
- S3Object – The S3 object information for the input manifest file, including the bucket and name of the file. input file includes JSON lines for each image stored on S3 bucket. for example: {"source-ref": "s3://MY-INPUT-BUCKET/1.jpg"}
OutputConfig – This includes the following:
- S3Bucket – The S3 bucket name for the output files.
- S3KeyPrefix – The key prefix for the output files.

See the following code:

import boto3
import os
import datetime
import time
import json
import uuid

region = boto3.session.Session().region_name
s3=boto3.client('s3')
rekognition_client=boto3.client('rekognition', region_name=region)

min_confidence = 50
input_bucket = "MY-INPUT-BUCKET"

input_file = "input_file.jsonl"
output_bucket = "MY-OUTPUT-BUCKET"
key_prefix = "moderation-results"
job_name = "bulk-analysis-demo"

job_start_response = rekognition_client.start_media_analysis_job(
    OperationsConfig={"DetectModerationLabels": {"MinConfidence": min_confidence}},
    JobName = job_name,
    Input={"S3Object": {"Bucket": input_bucket, "Name": input_file}},
    OutputConfig={"S3Bucket": output_bucket, "S3KeyPrefix": key_prefix},
)

job_id = job_start_response["JobId"]
max_tries = 60
while max_tries > 0:
    max_tries -= 1
    job = rekognition_client.get_media_analysis_job(JobId=job_id)
    job_status = job["Status"]
    if job_status in ["SUCCEEDED", "FAILED"]:
        print(f"Job {job_name} is {job_status}.")
        if job_status == "SUCCEEDED":
            print(
                f"Bulk Analysis output file copied to:\n"
                f"\tBucket: {job['Results']['S3Object']['Bucket']}\n"
                f"\tObject: {job['Results']['S3Object']['Name']}."
            )
        break
    else:
        print(f"Waiting for {job_name}. Current status is {job_status}.")
    time.sleep(10)

You can invoke the same media analysis using the following AWS CLI command:

aws rekognition start-media-analysis-job \
--operations-config "DetectModerationLabels={MinConfidence='50'}" \
--input "S3Object={Bucket=input_bucket,Name=input_file.jsonl}" \
--output-config "S3Bucket=output_bucket,S3KeyPrefix=moderation-results"

Amazon Rekognition Bulk Analysis API results

To get a list of bulk analysis jobs, you can use ListMediaAnalysisJobs. The response includes all the details about the analysis job input and output files and the status of the job:

# get the latest 10 media analysis jobs
moderation_job_list = rekognition_client.list_media_analysis_jobs(MaxResults=10, NextToken="")
for job_result in moderation_job_list["MediaAnalysisJobs"]:
 print(f'JobId: {job_result["JobId"]} ,Status: {job_result["Status"]},\n\
Summary: {job_result["ManifestSummary"]["S3Object"]["Name"]}, \n\
Result: {job_result["Results"]["S3Object"]["Name"]}\n')

You can also invoke the list-media-analysis-jobs command via the AWS CLI:

aws rekognition list-media-analysis-jobs --max-results 10

Amazon Rekognition Bulk Analysis generates two output files in the output bucket. The first file is manifest-summary.json, which includes bulk analysis job statistics and a list of errors:

{
    "version": "1.0",
    "statistics": {
      "total-json-lines": 2,
      "valid-json-lines": 2,
      "invalid-json-lines": 0
    },
    "errors": []
 }

The second file is results.json, which includes one JSON line per each analyzed image in the following format. Each result includes the top-level category (L1) of a detected label and the second-level category of the label (L2), with a confidence score between 1–100. Some Taxonomy Level 2 labels may have Taxonomy Level 3 labels (L3). This allows a hierarchical classification of the content.

{
  "source-ref": "s3://MY-INPUT-BUCKET/1.jpg",
    "detect-moderation-labels": {
    "ModerationLabels": [
      {
        "ParentName": "Products",
        "TaxonomyLevel": 3,
        "Confidence": 91.9385,
        "Name": "Pills"
      },
      {
        "ParentName": "Drugs & Tobacco",
        "TaxonomyLevel": 2,
        "Confidence": 91.9385,
        "Name": "Products"
      },
      {
        "ParentName": "",
        "TaxonomyLevel": 1,
        "Confidence": 91.9385,
        "Name": "Drugs & Tobacco"
      }
    ],
    "ModerationModelVersion": "7.0",
    "ContentTypes": [
      
    ]
  }
}

Improving Content Moderation model prediction using Bulk Analysis and Custom Moderation

You can enhance the accuracy of the Content Moderation base model with the Custom Moderation feature. With Custom Moderation, you can train a Custom Moderation adapter by uploading your images and annotating these images. Adapters are modular components that can extend and enhance the capabilities of the Amazon Rekognition deep learning model. To easily annotate your images, you can simply verify the predictions of your bulk analysis job to train a custom adapter. To verify the prediction results, follow the steps below:

On the Amazon Rekognition console, choose Bulk Analysis in the navigation pane.
Choose the bulk analysis job, then choose Verify predictions.

On the Verify prediction page, you can see all the images evaluated in this job and the predicted labels.

Select each image’s label as present (check mark) to validate a True Positive; or mark as non-present (X mark) to invalidate each assigned label (i.e., the label prediction is a False Positive).
If the appropriate label is not assigned to the image (i.e., False Negative), you can also select and assign the correct labels to the image.

Based on your verification, False Positives and False Negatives will be updated in the verification statistics. You can use these verifications to train a Custom Moderation adapter, which allows you to enhance the accuracy of the content moderation predictions.

As a prerequisite, training a custom moderation adapter requires you to verify at least 20 false positives or 50 false negatives for each moderation label that you want to improve. Once you verify 20 false positives or 50 false negatives, you can choose Train an adapter.

You can use Custom Moderation adapters later to analyze your images by simply selecting the custom adapter while creating a new bulk analysis job or via API by passing the custom adapter’s unique adapter ID.

Summary

In this post, we provided an overview of Content Moderation version 7.0, Bulk Analysis for Content Moderation, and how to improve Content Moderation predictions using Bulk Analysis and Custom Moderation. To try the new moderation labels and bulk analysis, log in to your AWS account and check out the Amazon Rekognition console for Image Moderation and Bulk Analysis.

About the authors

Mehdy Haghy is a Senior Solutions Architect at AWS WWCS team, specializing in AI and ML on AWS. He works with enterprise customers, helping them migrate, modernize, and optimize their workloads for the AWS cloud. In his spare time, he enjoys cooking Persian foods and electronics tinkering.

Shipra Kanoria is a Principal Product Manager at AWS. She is passionate about helping customers solve their most complex problems with the power of machine learning and artificial intelligence. Before joining AWS, Shipra spent over 4 years at Amazon Alexa, where she launched many productivity-related features on the Alexa voice assistant.

Maria Handoko is a Senior Product Manager at AWS. She focuses on helping customers solve their business challenges through machine learning and computer vision. In her spare time, she enjoys hiking, listening to podcasts, and exploring different cuisines.