Online Content Safety -A deep dive

8 min readMay 23, 2021

Safety is an everyday concern, be it online or offline. Social Media has a massive amount of User Generated Content (UGC) that flows into their platforms daily in many formats. Since these contents (Images, Audio, Video, Comments) are published by the common public and they belong to a diverse group(Cultural, Religious, Ethnic, Gender, Age, and other demographics) it’s a very labor-intensive task to determine, detect and remove contents that are malicious(they can belong to various categories like spam, terrorism, sexual abuse and nudity, self-harm, etc and the content can come in multiple formats). This study aims to dive deep into how we leverage Artificial Intelligence (AI) techniques to detect risk on our platforms as the manual processing and monitoring of billions of content daily is labor-intensive, cumbersome, and prone to risks.

INTRODUCTION

The term ‘Artificial Intelligence (AI) ’ is used to refer to the capability of a machine to exhibit human-like behavior for a defined task. The advances in AI have been driven by machine learning, which enables a computer system to make decisions and predict outcomes. The development of ‘deep neural networks’ which enable ‘deep learning’ enables systems to recognize complex data inputs such as human speech, images, and text. These systems can then deliver specific tasks for which they have been trained. AI-enabled content safety systems are developed to identify harmful content by following rules and interpreting many different examples of content that is and is not harmful. It can be challenging for machines to interpret the community standards of a platform to define if the content is harmful or not.

Clarity on the platform’s community guidelines is essential to enabling the development and refinement of AI systems to consistently enforce the standards. AI-based content moderation systems can reduce the need for human moderation and reduce the impact on them of viewing harmful content. Harmful content is generated by only a small category of users and hence AI techniques can be used to identify malicious users and prioritize their content for review. ‘Metadata’ can encode some context relevant to decision making like user’s history, the number of followers they have, and 5 information about the user’s real identity, such as age or location. The metadata available varies based on platforms and the format of content posted and hence it is difficult to have platform-agnostic moderation tools to make moderation decisions. AI architectures are also required for identifying different categories of potentially harmful content like child abuse, hate speech, violent content, etc all of which need different an approach. For example: identifying child abuse material requires consideration of the image or a video but also factors like age or location of the user and hence require techniques such as ‘object detection’ and ‘scene understanding’ to identify this type of material. On the other hand, identifying bullying content often requires full consideration of the context of the user interactions as well as the content itself as the characteristics of bullying content are not well defined. Hence the complexity of designing AI architectures for content safety is high as it needs to accommodate different categories of risks, different formats of content and also cater for regional or country-specific cultural variations which increases the costs and challenges for organizations to develop them.

HOW AI HELPS IN IDENTIFYING RISKS

We are evolving immensely in the space of AI capabilities with the advancement of advanced AI algorithms. Machine learning(ML) is the study of algorithms and statistical models which enable a system to make a decision or predict outcomes. Machine learning is implemented in two phases; a)Training-:exposing the machine to a set of data that can help it learn and improve performing a task. During this phase, a large number of datasets are fed to the models to allow them to analyze. This phase is a trial and hence difficult to predict the performance of the model. b)Inference-:is the process in which a trained machine/model makes decisions for new input data that it has not seen before. The output will be the effect of the learning it had during the training phase.

There are three approaches of ML which are suited to different applications. 1)Supervised Learning Mostly used for classifying data. It requires a dataset where each input has a labeled output. The machine then uses this to learn how to map each input to its respective output. Once the machine is trained and ready for inference then it can identify the outputs for new samples of input that the machine has not seen before. In the context of content safety, this mechanism can be used to train models with samples of content(audio, video, etc) that are violative in nature and thereby equipping the model to infer the risk levels of a newly generated content.

2)Unsupervised learning In this form of learning the machine is made to understand the structure of the dataset we feed without a label. The machine will have to group or cluster the input data by analyzing similarities among the inputs. This form of learning is mostly used in groupings like identifying customer segments for marketing or other business needs. 7 3)Reinforcement learning In this process the machine learns from attempts to reach its goal by learning from the feedback (positive or negative reinforcement). This method doesn’t need a dataset for training, but just a training environment with a predefined matrix that needs to be optimized by this model.

END TO END JOURNEY OF A USER GENERATED CONTENT (UGC)

Pre Moderation: This is the phase during which the content uploaded by a user is verified against safety standards before publication on any platform. This stage is generally performed by an AI model which can recognize the components of the uploaded content(Audio/Video/Text) against information which the AI model has been trained. If a known violation is recognized by the AI model, it will prevent the content from getting published and becoming a violation.

Post Moderation: After the content passes the pre-moderation stage it is allowed to be shared on the platform and available for consumption by the public. Since we are not yet ready for AI to be guardians of content safety, the published content would be reviewed by a human-assisted moderation system if the content is getting public attention (this can be defined by the number of likes, shares, or even by the number of time the content was viewed). At the Post moderation stage, AI can be used to sort and categorize the content to streamline the process and also to detect risks/violation and to flag them out to the human moderator who can them validate the interpretation of AI (This helps to save the time of the moderator from navigating the entire content).

Reactive Moderation: This is the process of moderation when there is a reaction from the community. If Premoderation failed to detect risk or it has not entered the post moderation stage, we can take responses from the community if they are getting reported by other users or being flagged as a risk.AI models, in this case, can help listen to the community feedback on various content,revalidate the content and route them to respective manual moderation teams. The Premoderation performed by AI models is the key pillar in keeping all our social media platforms safe by defining whether or not any content any of us publish is risk-free for public consumption. This is effective in proactively removing dangerous contents (child abuse, terrorism, self-harm). That said even with having these many layers of safety nets there are still risk leakages that happen daily since the AI models can only detect what they have been trained for and unidentified risks (potentially new categories of risk) will be approved during Pre-moderation.

CONTENT SAFETY STRATEGY — CHALLENGES

The fundamental challenge is that the Internet being global and any content uploaded can reach any part of the world, the effort in managing Content Safety is complying with every country’s national policies. This will require the AI models to be customized to cater to these variations. Apart from the country-specific variations the system also needs to consider various categories of risk and the formats in which they can spread in the platform.

2)Users have higher expectations of AI than Human moderation The Public perception of AI is a key challenge that has set a belief that AI systems can process and analyze greater volumes than a human. But any failure of the AI system or leakage can be questioned badly by National security teams as well as the common public. Hence this is a high-risk, high sensitive, high maintenance approach.

3)Benchmark Dataset needs to be carefully selected to represent real world problems. Over time this dataset needs to evolve to keep pace with changes around and need to be careful about the dataset not being biased such that it makes unfair decisions.

4)Over Lifting is a phenomenon that leads to over-optimization for the test data but the model does not respond well to previously unseen data during the actual execution.

5)The categories of risk keep evolving on a daily basis as human users find ways of beating the AI system’s capability to detect them. There is a high probability that until the appearance of one such new event, the model might not be trained to combat such kind of a violation and since it doesn’t have a dataset available, the risk might not be taken down during Pre-moderation which leads to the content being released to the public (risk -undetected )

BUSINESS CHALLENGES 1)Some businesses often prioritize growing an active user base over-investing in content safety. The techniques, Computation cost, high skilled labor cost for implementation of AI moderation strategies require high investments and leadership strategic attention

2)Often AI moderation is accompanied by labor-intensive manual moderation processes which involve high amounts of operational costs and therefore pause a challenge.

3)Analysing, Understanding, and implementing Country-level Policy measures is key is gaining trust and assuring safety across the globe. But that said this again is a labor-intensive task that needs expert legal advisors to actively contribute from every market at a required pace and also a highly skilled engineering team ready to tackle AI algorithms to counterfeit market demands.

From the above study, we can understand that safety is of paramount priority for any social platform. With the advancement of AI features, we are able to proactively identify the risk of User Generated Content and remove them from the platform even before it reaches the public audience. AI techniques have improved the efficiency of manpower (Creators) utilization for moderation and thereby reduced operational expenses. At the same time we are still not at a stage where we can rely 100% on AI for platform safety and hence need to invest better in research and development of these techniques which can better facilitate the field of “Platform Safety” and also helping avoid the trauma moderators have to go through.

Online Content Safety -A deep dive

INTRODUCTION

HOW AI HELPS IN IDENTIFYING RISKS

Written by Aiswarya B Menon