Test Methodology

⚡ TLDR

WikiWalls comparison and review pieces are tested on disclosed testbeds with disclosed reviewers. Hardware loaners are disclosed. Test cycles are repeated annually for comparison pieces and at six months and eighteen months for cornerstones. Methodology pages live alongside every cornerstone and Index report.

Testbed disclosure: every comparison piece names the hardware, software version, network conditions, and date of test
Reviewer credit: every review carries the named reviewer’s byline with sameAs to LinkedIn / ORCID
Loaner disclosure: when a brand provides hardware on loan for testing, the loan is disclosed
Re-test cycle: comparisons re-tested annually; cornerstones reviewed at six and eighteen months
Score scale: 0-10 with per-axis breakdowns, named criteria, verdict band

Testbed disclosure

Every comparison piece names the testbed. The reader sees what we tested on, in what environment, on what date. The disclosure block lives at the top of the piece, before the verdict.

Disclosed item	What we publish
Hardware	Make, model, key specs. For AI / SaaS testing: the machine running the test (e.g., Beelink SER8, Ryzen 7 8845HS, 64GB RAM)
Software versions	Exact version of the tool or service tested. SaaS pricing tier. API version
Network conditions	For latency-sensitive testing: connection type (residential fiber, 4G mobile, etc.), geographic location, P50 / P95 baseline latency to probe locations
Test date and duration	Date the test was conducted. Duration (point-in-time vs 30-day diary vs longer)
Sample size	Number of API calls, number of prompts, number of test runs. For comparison pieces: same sample size across all candidates
Reviewer	Named human with byline + author archive
Loaner status	Whether the unit was purchased, sent on loan, or provided by sponsor. Disclosed regardless

What we measure

Every comparison defines its axes before the test starts. The verdict is the per-axis breakdown, not a single number. We never reduce a multi-axis comparison to one score and call it a winner.

Per-vertical measurement defaults

AI APIs: latency (P50, P95), cost per 1M tokens, accuracy on a named workload (HumanEval+ for code, our internal classification set for ticket triage, etc.), tool-use reliability, streaming uptime over 30 days
AI dev tooling: code-edit acceptance rate on a sampled PR set, agentic-mode completion rate, terminal integration friction, multi-file refactor reliability
Hardware: idle wattage at the wall, sustained-load wattage, fan noise dB, boot time, thermal throttling threshold under sustained load, real measured RAM headroom under workload
eSIM: in-country activation friction, FUP throttle behavior, carrier networks per country, P50 download speed at three locations per country, customer support response time on three tickets
SaaS: deployment time from zero, monthly cost at 100K / 1M / 10M unit volume, integration count, support response time on three tickets, data export portability
Self-hosted infrastructure: deployment time from zero, monthly resource footprint, update friction over 90 days, backup-and-restore test, security audit checklist coverage

Score scale

Band	Score	What it means
Editor’s Pick	9.0 to 10.0	Best-in-class on the dominant axes. We recommend with no hedging
Strong	8.0 to 8.9	Recommend with situational caveats
Solid	7.0 to 7.9	Workable; not the right pick for most readers
Borderline	6.0 to 6.9	Skip unless a specific axis justifies it
Avoid	Below 6.0	We do not recommend

Re-test cadence

Software changes. Hardware ages. The score we published last year may not be the score the product earns today. Re-test cycles are published on the methodology page so the reader knows what to expect.

Format	Re-test cadence
Comparison piece	Annually. Visible “Last reviewed by [Name], [Month Year]” carries the cycle
Single-product review	At six months and eighteen months. Long-term observations added each cycle
Cornerstone	At six months and eighteen months, with full audit of methodology and rankings
Index report	Annual refresh. Methodology page published alongside
Take desk piece	Not re-tested (opinion is bounded in time). Updates appear as separate Take pieces with linked context
Glossary entry	Annually. Definitional content evolves slowly; date-checked once a year

Disclosure block

When a sponsor relationship exists with a brand named in a comparison, an explicit disclosure block sits at the top of the piece: “Disclosure: [Sponsor] is a [tier] sponsor of WikiWalls. The methodology, testing, and verdict in this piece are editorially independent. Read our editorial standards: /editorial-standards/.”

The methodology stays blind to sponsor. The reviewer does not know which products in a comparison set are sponsored until after the verdict is locked.

Methodology pages per Index

Every Index report ships with a dedicated methodology page at /the-index/[index-slug]/methodology/. The page covers: data collection method, sample size, geographic distribution, response cleaning, statistical method, confidence intervals where applicable, and the named lead researcher. Reproducibility is the standard.

Last reviewed by WikiWalls editorial.