Label testing is a research method that evaluates whether users correctly understand what navigation labels and category names mean. You've figured out the right groups through card sorting — now you need to make sure the names on those groups don't confuse people.
Card sorts — especially open card sorts — produce participant-generated category names. These are gold for understanding how users think about groupings. But as labels, they're often terrible. Participants name groups quickly, without considering how the label works alongside other navigation items. You'll end up with overlapping names like "My Stuff" and "My Account," or vague ones like "Other" and "More."
The gap between a good grouping and a good label is where label testing lives.
There are a few approaches, and they can be mixed:
Preference survey: Show participants 2-4 candidate labels for the same category, along with a brief description of 3-4 items that live inside it. Ask which label best describes that collection. This takes about 15 minutes to set up and 2 minutes per participant to complete. Twenty to thirty responses gives you a clear winner in most cases.
Descriptive task: Show a label in isolation and ask participants to list what they'd expect to find behind it. If "Shipping & Delivery" consistently prompts responses about tracking orders and delivery times, the label works. If responses scatter across returns, billing, and order history, the label is too vague or too broad.
5-second test: Flash a screenshot of your proposed navigation for 5 seconds, then ask participants what they remember and where they'd click for a specific task. This tests label scannability — whether labels register in the quick visual sweep users actually give your nav bar.
After a card sort for an e-commerce site, the team had a clear cluster of shipping-related content: tracking info, delivery estimates, shipping costs, carrier options, and international shipping policies. Participants had sorted these together consistently, with agreement rates above 75%.
But what to call this group? The card sort produced three contenders from participants: "Shipping & Delivery" (used by 40% of participants), "Orders" (used by 35%), and "Track My Package" (used by 15%). The remaining 10% used variations like "Deliveries" or "My Orders."
A label test with 30 participants revealed something the card sort couldn't: "Track My Package" dramatically outperformed the others for task-based questions about checking delivery status, even though fewer card sort participants had used it as a category name. But it failed for pre-purchase questions like "find out shipping costs" — those participants went to "Shipping & Delivery" instead.
The insight: one label couldn't cover both use cases. The team split the content across "Shipping Info" (pre-purchase) and "Track Orders" (post-purchase), which tested well in a follow-up tree test.
Keep your label options genuinely distinct. Testing "Help Center" against "Help & Support" against "Support Center" won't tell you much — they're too similar. Test labels that reflect different mental models: "Help Center" vs. "Fix a Problem" vs. "Contact Us."
Don't test more than 4 options per category. Choice overload kills signal. If you have 6 candidates, narrow to 4 based on frequency in your card sort data, then test.
And test labels in context, not isolation. A label that makes perfect sense alone might create confusion when placed next to another label in the same nav bar. "Products" and "Shop" might each test well independently but overlap when side by side.
How do you run a label test? The simplest approach is a survey showing 2-4 label options for the same category and asking participants which one they'd click to find specific content. For example, show "Shipping & Delivery," "Orders," and "Track My Package," then ask where they'd go to check on a recent purchase. Run this with 20-30 participants. You can also do 5-second tests — show a navigation screenshot briefly and ask what they'd expect to find under each label.
When should you do label testing relative to card sorting? Label testing works best as a follow-up to card sorting. First, run an open card sort to discover how users group content. Then use the category names participants created as candidate labels and test them against each other. This sequence typically takes 1-2 weeks total and catches labeling problems before you build anything. You can also run label testing independently when you're renaming existing navigation categories.
What makes a good navigation label? Good labels are specific, mutually exclusive, and use the language of your users rather than internal jargon. They should pass the "lunch test" — if you described the label to someone at lunch, they'd know what's behind it without needing your sitemap. Avoid branded terms, abbreviations, and category names that overlap in meaning. Labels under 3 words tend to perform better in navigation because they're scannable.
Explore related concepts, comparisons, and guides