How to Do Visual Testing: Common Pitfalls and Solutions

How to Do Visual Testing: Common Pitfalls and Solutions

So you’ve heard about visual testing, but you’re not entirely sure where to start—or why it matters for your mobile app. Maybe you’ve already tried it and ended up drowning in false positives. You’re not alone. Visual testing is one of those things that sounds simple in theory but gets messy fast in practice. This FAQ covers the real questions teams ask when figuring out how to do visual testing effectively, especially for mobile apps. No fluff, just practical answers.

What is visual testing and why should I care?

Visual testing is the practice of automatically comparing screenshots of your app’s UI to a known “baseline” image. The goal is to catch unintended visual changes—things like a button shifting two pixels to the left, a font color that mysteriously changed from #333 to #444, or an image that stopped loading. These are bugs that unit tests simply miss. Unit tests check logic, not layout.

For mobile apps, the stakes are higher. You’re dealing with different screen sizes, OS versions, and device densities. A layout that looks perfect on an iPhone 15 Pro might break on a Pixel 8. Automated visual testing catches those regressions before your users do. Honestly, if you ship a mobile app without any visual checks, you’re flying blind. It’s not optional anymore—it’s table stakes for any team that cares about UI quality.

How do I start with visual testing for a mobile app?

First, pick a tool that fits your stack. If you’re using React Native, Sherlo is purpose-built for that ecosystem and integrates directly with Detox and Storybook. For other frameworks, you might look at Percy or Applitools. But the key is integration: your tool should plug into your existing test runner, not require a whole new workflow.

Once you’ve chosen a tool, capture a baseline. Run your app on a real device emulator (or a physical device if you can), navigate to your key screens, and take screenshots. These become your “truth.” Then, on every pull request, run the same screenshots and compare them to the baseline. If something changes, the test fails. You review the diff, decide if it’s intentional, and either fix the code or update the baseline. That’s the loop.

Start small. Pick 5–10 critical screens (login, home feed, checkout). Don’t try to cover everything on day one. You’ll scale later.

What are the most common pitfalls in visual testing?

Let’s be honest—visual regression testing can be a pain if you don’t set it up right. Here are the three biggest traps teams fall into:

  • Flaky tests from dynamic content. Animations, date pickers, loading spinners—these change every time you run a test. Your screenshots will never match. Solution: freeze animations, mock dates, and use static fixtures for any dynamic data.
  • Too many screenshots. I’ve seen teams capture 500 screenshots per test run. That’s noise, not signal. Focus on critical user flows. You don’t need a screenshot of every dropdown state.
  • Ignoring false positives. When a test fails, it’s tempting to just update the baseline without looking at the diff. Don’t. You’ll miss real regressions. Review every failure, even the ones that seem minor.

These pitfalls are why many teams abandon visual testing after a few weeks. But with the right approach, they’re all solvable.

How do I handle dynamic content in visual tests?

Dynamic content is the #1 cause of flaky visual tests. The fix isn’t complicated, but it requires discipline. Here’s what works:

  • Replace dynamic data with static fixtures. Instead of using real timestamps, use a fixed date like “2026-01-01”. Instead of random user IDs, use “user-123”. Your test data should be predictable.
  • Use test IDs to wait for elements to settle. Before taking a screenshot, wait for a specific element (like a “loaded” indicator) to appear. This prevents capturing a half-rendered screen.
  • Choose a tool with smart diffing. Some tools, like Sherlo, allow you to ignore certain regions or set tolerance thresholds. That way, a 1-pixel anti-aliasing difference doesn’t break your build.

Another trick: freeze CSS animations and transitions in your test environment. Most testing frameworks support this with a simple config flag. Do it.

Which visual testing tools work best for mobile apps?

Here’s a quick comparison of the main players, based on what I’ve seen teams actually use:

Tool Best For Key Strength Pricing
Sherlo React Native apps Native-level capture, Detox integration, cloud diff reviews Free tier available
Percy Mobile web + native (via Appium) Mature SaaS, wide framework support Pay per snapshot
Applitools Cross-platform apps AI-powered diffing, supports many frameworks Can be pricey

If you’re doing React Native visual testing, Sherlo is the clear winner. It captures screenshots at the native level (not web-based), so you get pixel-perfect results that match what users actually see. Percy and Applitools are solid for cross-platform, but they add complexity. Start with a tool that matches your stack.

How do I integrate visual testing into CI/CD?

You want visual tests to run automatically, preferably on every pull request. Here’s the standard setup:

  1. Run unit and integration tests first. If those fail, there’s no point running visual tests—fix the logic first.
  2. In a separate CI step, run your visual test suite. Use a tool that can compare screenshots against the stored baseline.
  3. If differences are found, the CI step should fail. But don’t just fail—provide a link to the diff so developers can review it.

Sherlo integrates directly with GitHub and GitLab, showing diffs right in the pull request. That’s huge. No one wants to open a separate dashboard to review a test failure. Keep the feedback loop tight.

One practical tip: run visual tests in parallel. If you have 50 screenshots to capture, splitting them across 5 CI runners cuts the time from 10 minutes to 2. Most tools support this, including Sherlo.

What’s the difference between visual regression testing and screenshot testing?

People use these terms interchangeably, but they’re not the same thing. Screenshot testing is just capturing an image. Visual regression testing is comparing that image to a baseline and flagging differences.

Screenshot testing is a step—it’s what you do before you can compare. Visual regression testing is the process that includes diff highlighting, baseline management, and a review workflow. Without the comparison step, you’re just taking pictures. You need both pieces to get value.

Most modern tools (Sherlo, Percy, Applitools) bundle both. But if you’re building your own solution, don’t stop at just capturing screenshots. You need a way to compare them, highlight differences, and manage baselines over time. That’s where the real work is.

How do I maintain visual test baselines over time?

Baselines are your source of truth. Treat them like code. Here’s how to keep them healthy:

  • Update baselines deliberately. After an intentional UI change (like a redesign or new feature), update the baseline. But don’t update it after every commit—that defeats the purpose.
  • Use version control for baselines. Store them in a git repo or use a cloud service like Sherlo that tracks history. That way, you can roll back if something goes wrong.
  • Involve designers in the review process. When a visual test fails, have a designer look at the diff. They’ll spot issues that developers might miss (like a wrong shade of blue or a spacing inconsistency).

One mistake teams make: letting baselines drift because no one reviews failures. Set a rule: every baseline update must be approved by at least one other person, ideally a designer.

Can visual testing replace unit or integration tests?

No. Full stop. Visual testing catches UI bugs. Unit tests catch logic bugs. Integration tests catch interaction bugs. They’re complementary, not interchangeable.

Think of it this way: a unit test verifies that a function returns the right value. A visual test verifies that the button rendering that value looks correct. If you skip unit tests, you might ship a broken calculation. If you skip visual tests, you might ship a button that’s off-screen. You need both.

For mobile apps, I recommend this order: unit tests → integration tests (like Detox) → visual tests. Each layer catches different types of bugs. Visual tests are the final safety net, not the first line of defense.

How do I reduce false positives in visual testing?

False positives kill trust in your test suite. Here’s how to minimize them:

  • Set a tolerance threshold. Most tools let you define a percentage of allowed pixel difference. In Sherlo, you can set it to 0.1%—small enough to catch real bugs, large enough to ignore anti-aliasing noise.
  • Ignore predictable regions. Ad banners, loading spinners, and animated elements change every time. Mark those regions as “ignore” in your test config.
  • Use a stable test environment. Fixed viewport size, consistent fonts, and no network variability. If your test environment changes between runs, your screenshots will too.

Also, don’t over-test. If a screen has 20 possible states, test the 3–4 that actually matter. Fewer tests mean fewer false positives to triage.

What should I do when a visual test fails?

First, don’t panic. A failing visual test is information, not a crisis. Here’s the triage flow:

  1. Review the diff. Look at the side-by-side comparison. Is the change intentional (you updated the UI) or a regression (something broke)?
  2. If intentional: Update the baseline. Most tools have a one-click “approve” button for this.
  3. If a bug: Fix the code, then re-run the tests to confirm.

Use a tool that shows diffs clearly. Sherlo and Percy both offer side-by-side views with highlighted differences. That makes triage fast—usually under 30 seconds per failure. If you’re spending more than a minute per failure, something’s wrong with your setup.

How do I scale visual testing for a large app?

Scaling visual testing is about prioritization and parallelization. Here’s the playbook:

  • Prioritize high-traffic screens. Login, checkout, dashboard—these are where users spend most of their time. Test those first. Secondary screens can wait.
  • Run tests in parallel. Split your test suite across multiple CI runners. Sherlo supports parallel execution natively, so you can cut test time dramatically.
  • Use component-level visual tests. With Storybook, you can test individual components in isolation. That catches bugs early, before they reach the full screen.

For a large app, aim for 100–200 screenshots per test run. That’s enough for full coverage without becoming unmanageable. If you’re at 500+, you’re probably testing too many states.

What are the best practices for visual testing in React Native?

React Native has unique challenges because it renders natively, not in a browser. Here’s what works:

  • Use Sherlo. It’s built for React Native and captures screenshots at the native layer, so you get accurate rendering.
  • Test on real device emulators. The iOS simulator and Android emulator give you the closest approximation to a real device. Don’t rely on web-based rendering.
  • Keep components small and test them in isolation. Use Storybook to render individual components with mocked data. That way, a failure tells you exactly which component broke.

One more thing: mock your API calls. Real network requests introduce variability. Use a library like MSW (Mock Service Worker) to return consistent data in your tests.

How do I convince my team to adopt visual testing?

This is the hardest part—not the technology, but the people. Here’s how I’ve seen teams successfully pitch it:

  • Show real examples. Find a visual bug that made it to production. Screenshot it. Show the team how a visual test would have caught it before release.
  • Demonstrate time savings. Manual QA for visual regression takes hours. Automated visual testing takes minutes. Run a side-by-side comparison of the time cost.
  • Start with a pilot. Use Sherlo’s free tier. Test one critical flow. Show the team the results. Let them see how easy it is to review diffs.

Honestly, once a team sees a visual test catch a real bug that code review missed, they’re sold. It’s one of those tools that sells itself—you just need to give it a chance to prove its value.