Skip to content

Test Methodology


⚡ TLDR

WikiWalls comparison and review pieces are tested on disclosed testbeds with disclosed reviewers. Hardware loaners are disclosed. Test cycles are repeated annually for comparison pieces and at six months and eighteen months for cornerstones. Methodology pages live alongside every cornerstone and Index report.

  • Testbed disclosure: every comparison piece names the hardware, software version, network conditions, and date of test
  • Reviewer credit: every review carries the named reviewer’s byline with sameAs to LinkedIn / ORCID
  • Loaner disclosure: when a brand provides hardware on loan for testing, the loan is disclosed
  • Re-test cycle: comparisons re-tested annually; cornerstones reviewed at six and eighteen months
  • Score scale: 0-10 with per-axis breakdowns, named criteria, verdict band

Testbed disclosure

Every comparison piece names the testbed. The reader sees what we tested on, in what environment, on what date. The disclosure block lives at the top of the piece, before the verdict.

Disclosed itemWhat we publish
HardwareMake, model, key specs. For AI / SaaS testing: the machine running the test (e.g., Beelink SER8, Ryzen 7 8845HS, 64GB RAM)
Software versionsExact version of the tool or service tested. SaaS pricing tier. API version
Network conditionsFor latency-sensitive testing: connection type (residential fiber, 4G mobile, etc.), geographic location, P50 / P95 baseline latency to probe locations
Test date and durationDate the test was conducted. Duration (point-in-time vs 30-day diary vs longer)
Sample sizeNumber of API calls, number of prompts, number of test runs. For comparison pieces: same sample size across all candidates
ReviewerNamed human with byline + author archive
Loaner statusWhether the unit was purchased, sent on loan, or provided by sponsor. Disclosed regardless

What we measure

Every comparison defines its axes before the test starts. The verdict is the per-axis breakdown, not a single number. We never reduce a multi-axis comparison to one score and call it a winner.

Per-vertical measurement defaults

  1. AI APIs: latency (P50, P95), cost per 1M tokens, accuracy on a named workload (HumanEval+ for code, our internal classification set for ticket triage, etc.), tool-use reliability, streaming uptime over 30 days
  2. AI dev tooling: code-edit acceptance rate on a sampled PR set, agentic-mode completion rate, terminal integration friction, multi-file refactor reliability
  3. Hardware: idle wattage at the wall, sustained-load wattage, fan noise dB, boot time, thermal throttling threshold under sustained load, real measured RAM headroom under workload
  4. eSIM: in-country activation friction, FUP throttle behavior, carrier networks per country, P50 download speed at three locations per country, customer support response time on three tickets
  5. SaaS: deployment time from zero, monthly cost at 100K / 1M / 10M unit volume, integration count, support response time on three tickets, data export portability
  6. Self-hosted infrastructure: deployment time from zero, monthly resource footprint, update friction over 90 days, backup-and-restore test, security audit checklist coverage

Score scale

BandScoreWhat it means
Editor’s Pick9.0 to 10.0Best-in-class on the dominant axes. We recommend with no hedging
Strong8.0 to 8.9Recommend with situational caveats
Solid7.0 to 7.9Workable; not the right pick for most readers
Borderline6.0 to 6.9Skip unless a specific axis justifies it
AvoidBelow 6.0We do not recommend

Re-test cadence

Software changes. Hardware ages. The score we published last year may not be the score the product earns today. Re-test cycles are published on the methodology page so the reader knows what to expect.

FormatRe-test cadence
Comparison pieceAnnually. Visible “Last reviewed by [Name], [Month Year]” carries the cycle
Single-product reviewAt six months and eighteen months. Long-term observations added each cycle
CornerstoneAt six months and eighteen months, with full audit of methodology and rankings
Index reportAnnual refresh. Methodology page published alongside
Take desk pieceNot re-tested (opinion is bounded in time). Updates appear as separate Take pieces with linked context
Glossary entryAnnually. Definitional content evolves slowly; date-checked once a year

Disclosure block

When a sponsor relationship exists with a brand named in a comparison, an explicit disclosure block sits at the top of the piece: “Disclosure: [Sponsor] is a [tier] sponsor of WikiWalls. The methodology, testing, and verdict in this piece are editorially independent. Read our editorial standards: /editorial-standards/.”

The methodology stays blind to sponsor. The reviewer does not know which products in a comparison set are sponsored until after the verdict is locked.

Methodology pages per Index

Every Index report ships with a dedicated methodology page at /the-index/[index-slug]/methodology/. The page covers: data collection method, sample size, geographic distribution, response cleaning, statistical method, confidence intervals where applicable, and the named lead researcher. Reproducibility is the standard.

Last reviewed by WikiWalls editorial.