
CONTEXT
Designed a Twitter spam moderation workbench that increased moderator efficiency by 22 percent while improving the quality of data used to train machine learning models
My Role
Senior Product Designer
Client
Twitter (via Innodata)
Timeline
2021 — 6 Month Contract
Industry
ML/AI, Data Annotation
Platform
Web
Tools
Figma, Wireframing, User Research
OVERVIEW
Improving Data Quality for Twitter's Spam Detection AI
Innodata is a global leader in machine learning and AI, specializing in data collection, annotation, and platform development that help train accurate AI models.
The Twitter Spam Violation Workbench is one of several annotation tools Innodata provided to Twitter. It allowed human moderators to review batches of tweets flagged for possible spam, check the user’s account history, and answer a short set of questions to classify the violation.
Their annotations were then returned to Twitter to improve the accuracy of its machine learning models. Moderators were measured by how many tasks they could complete in a session, so speed and consistency were essential to producing high-quality data.
THE PROBLEM
How can the Twitter Spam Violation Workbench be improved to boost moderator SAR scores and produce more consistent, higher-quality annotations
After reviewing feedback from Twitter, the team saw a clear need to improve the volume, consistency, and quality of the data feeding their machine learning models. Through discussions with Twitter, product management, and engineering, we determined that the best path forward was to improve the moderator workbench experience.
KPIs
Measuring Success
The team aligned on a two key metrics to measure success and then moved into execution
Increase SAR scores by 10%
Improve moderator efficiency to process more annotation tasks per session.
Better identify top performers
Provide managers with clearer data to identify moderators with the highest volume, impact and revenue.
PROCESS
Understanding the Moderator Experience
Understanding Requirements
As the sole designer on the project I needed to get a clear picture of the workbench rules and how moderators flowed through the Spam Violation annotation process. First, I parsed through multiple documents to condense the workbench ruleset into a concise list of requirements.
Analyzing Real User Behavior
The Twitter team shared dozens of screen-capture videos showing moderators working in real time. These recordings became an invaluable source of insight for the project.
The videos revealed several pain points that slowed moderators down. They relied on multiple browser tabs to review tweets, check user accounts, search for keywords, and translate text. Much of their time was spent copying and pasting information between tabs.
We brought these findings back to the Twitter team to confirm their accuracy and aligned on the opportunity to address them as a way to improve SAR scores.
Mapping the Moderator Experience
Including the Twitter team throughout the design process was essential. We brought key stakeholders and moderators together for a What If exercise that helped us explore a wide range of ideas.

Collaborative Ideation
Including the Twitter team in every step of the design process was very important. We brought together key stakeholders and moderators for a "What If" exercise allowing us to go wide with our thinking.

Design and Iteration
Based on what we learned from research and ideation, it was clear that moderators needed all of their key data points brought directly into the app. Switching between multiple browser tabs was slowing them down. They also needed simple tools to translate and copy text. I created a series of mockups exploring different layouts and interactions, then met with the team again to validate the direction.

SOLUTION
A Streamlined Annotation Experience
After final approval, I completed the wireframes and acceptance criteria and worked with engineering to begin development. Regular touchpoints helped ensure the design was interpreted and implemented as intended.
1 — Consolidated Interface
All moderator tools and data consolidated into a single view, eliminating the need for multiple browser tabs.

2 — Inline Translation and Copy Tools
Built-in translation and text copying features reduced time spent switching between tools.

3 — Progress Tracking
Clear batch and task progress indicators helped moderators track their SAR scores in real-time.

3 — Progress Tracking
Clear batch and task progress indicators helped moderators track their SAR scores in real-time.

IMPACT
Proven Gains in Speed and Data Quality
After the workbench had been in production for a few weeks, we gathered performance data and feedback from the Twitter team that confirmed our goals and KPIs were being met.
SAR scores increased by 22%
Moderators were able to process more annotation tasks per session, exceeding the initial 10% target.
Managers had more trust in the SAR data
Better data quality helped identify top and bottom performers more accurately.
TAKEAWAYS
Key Learnings from the Project
Stakeholder Involvement
Frequent touchpoints with the Twitter team proved essential. Including stakeholders and moderators throughout the process built trust, created shared ownership, and helped validate the final solution once it reached production.
Value of Real User Data
Watching moderators work in real conditions gave us insights we could not have uncovered through interviews alone. The screen-capture videos revealed true workflow patterns and pain points, guiding more accurate and informed design decisions.
Designing for Wellbeing
Moderators regularly encounter harmful content. Although we improved their tools, I wish we could have introduced stronger ways to mask or buffer the imagery they see. Moderator wellbeing remains an important consideration for future content moderation tools.
Working on something ambitious?
I'd love to hear what you're building. Feel free to send me a note.
More Projects
SYCLE
Researched and designed a Payers & Plans feature for 5,000+ clinics and 1M+ patients
SYCLE - APPOINTMENT CREATE
Appointment Create Rethink, patient first scheduling
OWNLY
Improved homebuyer exploration and pricing flows, boosting conversions 24%
SOLINK