Overview › Testing

Testing

~280 test files covering ViewModels, mappers, repos; UI test coverage moderate.

66 findings in this category
73
SCORE

Executive summary

Costco Android has a mature, multi-layered test stack: ~170 unit-test files, ~80 instrumentation tests across 30+ modules. The unit-test foundation is JUnit 4 + MockK + Turbine + a custom MainCoroutineRule; UI automation uses Compose UI Test + Espresso + UiAutomator with Karumi Shot for screenshot regression. JaCoCo enforces a 40.81% coverage threshold, which is low for a retail-scale codebase. There are concrete gaps around library version drift (Hilt-testing 2.28-alpha vs Hilt 2.56), absent Macrobenchmark / baseline profiles, and no Firebase Test Lab integration in CI.

1. Tooling inventory

All versions sourced from gradle/libs.versions.toml and the Costco/build.gradle file.

LayerToolVersionPurpose
Unit testingJUnit 44.13.2Primary test runner across all modules
JUnit 5 (Jupiter)6.0.3Available but not adopted broadly
MockK1.14.9Kotlin-first mocking library
Mockito + mockito-inline5.22.0 / 5.2.0Java mocking; coexists with MockK
Hamcrest3.0Assertion matchers
kotlin-test2.0.0-RC1kotlin.test assertions (limited use)
Coroutines / Flowkotlinx-coroutines-test1.10.2TestDispatcher / runTest
Turbine1.2.1Flow assertion DSL
androidx.arch.core:core-testing2.2.0InstantTaskExecutorRule for LiveData
UI automationEspresso3.7.0XML-View based UI assertions
Compose UI Test (junit4)matches Compose 1.10.4Composable semantics tree assertions
androidx.test.ext:junit1.3.0AndroidJUnitRunner integration
Test Orchestrator1.6.1Process isolation per test
UiAutomator2.3.0Cross-app / system-dialog interaction
Hilt-testing2.28-alpha@HiltAndroidTest entry point
ScreenshotsKarumi Shot (plugin + runner)6.1.0Reference-image snapshot regression
CoverageJaCoCo0.8.8Line + branch coverage; 40.81% threshold

2. Unit testing — what's in place

Test taxonomy

The ~170 unit-test files break down (approximately) as:

Conventions used

AspectConvention observedNotes
File namingXxxTest.kt / XxxTest.java (e.g. AccountViewModelTest.kt)Consistent
Method namingcamelCase (e.g. fetchUserData_returnsSuccess())No backtick/spaced names; Given-When-Then not standardized
MockingMockK with @MockK + MockKAnnotations.init(this, relaxUnitFun = true)Mockito coexists in legacy Java tests
AssertionsJUnit Assert.assertEquals, occasional Hamcrest matchersTruth/AssertJ not adopted
CoroutinesCustom MainCoroutineRule + UnconfinedTestDispatcher + TestScopeRe-implemented in 7 different modules — see findings below
LiveData@get:Rule InstantTaskExecutorRule()Standard for ViewModel tests
FlowTurbine flow.test { … }Used inconsistently — some tests collect manually
Test fixturesHand-written fakes (e.g. FakeDataStore.kt in shared/auth)Per-module; no shared :testfixtures module

Concrete file references

3. Automation / UI testing — what's in place

Test types running on device / emulator

Test typeFrameworkWhere used
Compose component testCompose UI Test (createComposeRule())shared/sdui (~40 tests), shared/topbar, shared/navigationheader
Espresso XML View testEspresso onView / onDataLegacy screens in Costco/src/androidTest
Page Object Model E2EEspresso + UiAutomator + HiltCostco/src/androidTest/.../costcoUITests/pages/
Hilt instrumentation test@HiltAndroidTest + HiltAndroidRule5+ modules; shared/topbar/.../DefaultNavHeaderTest.kt
Screenshot regressionKarumi Shotfeature/account/.../MembershipCardComponentScreenShotTest.kt + others under *ScreenShotTest.kt

Page Object Model

The androidTest tree under the main app module follows an organized POM-style structure with one file per surface:

UiAutomator falls in for system dialogs (e.g. permission prompts) — see UiDevice.getInstance() usage in HomePageTest.kt.

Compose UI Test pattern

Component-level tests use createComposeRule() + composeTestRule.setContent { … } to mount the Composable in isolation, then assert on semantics nodes:

@RunWith(AndroidJUnit4::class)
@HiltAndroidTest
class DefaultNavHeaderTest {
  @get:Rule(order = 0) val hiltRule = HiltAndroidRule(this)
  @get:Rule(order = 1) val composeTestRule = createComposeRule()

  @Test
  fun renders_brand_logo() {
    composeTestRule.setContent { CostcoTheme { DefaultNavHeader(...) } }
    composeTestRule.onNodeWithTag("brand_logo").assertIsDisplayed()
  }
}

Test runner configuration

The app module declares two runners depending on task:

Files: Costco/build.gradle (lines 69–80)

4. Coverage

JaCoCo is wired in via jacoco.gradle at the repo root (toolVersion 0.8.8), with a coverageCheck task that fails the build if line coverage drops below 40.81%.

AspectDetail
Coverage toolJaCoCo 0.8.8
Threshold40.81% (line)
ReportsHTML + XML, per build variant
Excluded from coverageActivities, Fragments, Compose components, generated DI/Serialization/R/BuildConfig
Sonar / SonarCloudNot detected in CI
PR-level coverage diffNot detected

5. CI integration

Test execution runs on Azure Pipelines. The pipeline file is azure-pipelines.yml (~50 KB). Highlights:

6. Findings

PASS

Modern Kotlin-first unit-test stack

MockK + Turbine + Coroutines test + InstantTaskExecutorRule reflects current best practice. Repository / Mapper / ViewModel layers are well-tested.
PASS

Page Object Model adopted

E2E tests under costcoUITests/pages/ with TestConstant.kt for data — a maintainable pattern, especially with Hilt and Test Orchestrator.
PASS

Screenshot regression with Karumi Shot

Reference-image testing for design-system surfaces — catches visual regressions early. Files match *ScreenShotTest.kt.
HIGH

Hilt-testing version drift (2.28-alpha vs Hilt 2.56)

Production code uses Hilt 2.56 while hilt_testing is pinned at 2.28-alpha. Mismatched versions risk subtle bugs in test-time DI — generated factories may not align with runtime ones, and bug fixes from 2.28 → 2.56 are missed.
Recommendation: Bump hilt-android-testing to match hilt-android (2.56). Run all @HiltAndroidTest suites after the bump.
HIGH

Coverage threshold is too low (40.81%)

For a retail-scale Android app, a 40% threshold lets large untested surface areas slip through. Combined with the broad exclusion list (Activities, Fragments, Compose), real branch coverage is likely lower.
Recommendation: Stair-step the threshold quarterly toward 60–70% line coverage. Publish per-PR coverage diffs (Codecov or SonarCloud) so reviewers see what changed code is uncovered.
MEDIUM

MainCoroutineRule duplicated across 7 modules

The same TestWatcher-based coroutine rule is re-implemented in feature/discover, feature/account, feature/warehouse, and 4 others. Drift is inevitable; one module already uses StandardTestDispatcher while another uses UnconfinedTestDispatcher.
Recommendation: Create a shared/testfixtures module exposing MainCoroutineRule, FakeXxx repos, and TestData. Have feature modules testImplementation project(":shared:testfixtures").
MEDIUM

JUnit 4 + JUnit 5 both available

libs.versions.toml declares both junit 4.13.2 and Jupiter 6.0.3. JUnit 5 lifecycle (@BeforeEach vs @Before) and rule semantics differ; mixing causes confusion.
Recommendation: Pick one (JUnit 4 since most existing tests use it). Remove Jupiter from the catalog or document a clear migration plan.
MEDIUM

Mockito + MockK both present

Both are wired up; MockK is used for new Kotlin tests, Mockito persists in legacy Java tests. Cross-cutting helpers may target one library only, fragmenting test utilities.
Recommendation: Standardize on MockK for Kotlin. Keep Mockito only as long as legacy Java tests live; add a Detekt rule to forbid new Mockito imports in .kt files.
MEDIUM

Heavy reliance on mocks over fakes

Repositories and remote sources are typically mocked rather than substituted with hand-written fakes. Mocks couple tests to implementation; fakes describe behavior and survive refactors better.
Recommendation: Build a small library of fakes (FakeAccountRepository, FakeBffService) in the proposed :shared:testfixtures module. Reserve MockK for verifying interactions, not stubbing data.
MEDIUM

No Macrobenchmark / baseline profile

No :macrobenchmark module and no baseline-profile task. Cold-start, scrolling, and frame-drop regressions go unmeasured release-over-release.
Recommendation: Add a Macrobenchmark module that exercises Home → PDP → Cart. Generate a baseline profile and ship it in the APK to optimize cold-start.
MEDIUM

No Firebase Test Lab in CI

UI tests run on the local Gradle-managed device only. No matrix run across OEM device variants — Samsung One UI / Xiaomi MIUI / Pixel — where layout / WebView / camera bugs commonly surface.
Recommendation: Wire Firebase Test Lab (or BrowserStack App Live) into the nightly build with a representative 3–5 device matrix. Gate releases on these results.
LOW

Karumi Shot vs newer alternatives

Karumi Shot 6.1.0 still works but its successor ecosystem (Paparazzi, Roborazzi) runs without an emulator and is faster on CI. Roborazzi additionally integrates with Compose preview annotations.
Recommendation: Pilot Roborazzi on one feature module; evaluate runtime + flake rate vs Shot. If favorable, migrate over a quarter.
LOW

No Test Retry plugin in Gradle

UI / instrumentation tests can flake (animation, network). Without retry tracking, flakes are silent.
Recommendation: Apply org.gradle.test-retry with a hard ceiling (1 retry); track flake rate in a CI dashboard.
LOW

Test naming inconsistency

Mix of camelCase test method names; no enforced Given-When-Then or backtick-spaced names. Reviewers spend time inferring intent.
Recommendation: Adopt backtick-spaced naming (e.g. fun \`returns Loading then Success when fetch succeeds\`()) and document in a CONTRIBUTING.md test section.
INFO

Mutation testing absent

No Pitest / Stryker mutation testing. Coverage % alone says nothing about assertion quality.
Recommendation: Add Pitest on a single module as a pilot; use mutation score as a complementary quality signal.

7. Recommended target state

CapabilityTodayTarget (12 months)
Unit-test coverage40.81% threshold65–70% line coverage; PR-diff blocking on changed files
Coroutine ruleDuplicated across 7 modulesSingle source in :shared:testfixtures
Mock vs Fake splitMock-heavyFakes for repos/services; MockK for verifying interactions
Hilt-testing version2.28-alpha2.56 matching production
Screenshot testsKarumi ShotRoborazzi (or Paparazzi) for Compose; off-emulator
Performance testingNoneMacrobenchmark + baseline profile shipped in APK
Device matrixSingle GMDFirebase Test Lab nightly across 3–5 devices
Mutation testingNonePitest on critical modules (productdetaillanding, account)
Flake handlingManualGradle test-retry + flake dashboard
Test reportingJUnit XML in AzureAzure + Codecov/SonarCloud + PR diff annotations

Specific findings in this category

66 shown
Costco Android · Code Review Report · Generated 2026-05-07 · 626 machine-curated findings