Insight · Performance Pay & Calibration

A Merit Framework Is Policy Work. The Ratings Are an Instrument Problem.

By Tabitha Weinstein · May 22, 2026 · 4 minute read

Oversight and audit-ready review of compliance findings and detail documentation

A merit-pay framework is policy work. The ratings underneath it are an instrument problem. Most public-sector score-to-percentage systems that fail in year two fail at the instrument layer, not the framework. The framework is the easy part. The instrument is the hard part. Most engagements solve the wrong one.

The failure pattern

Counties commission a merit redesign. The driver is usually pressure to differentiate pay by performance: high performers should earn more, low performers should earn less, the County should not be giving a flat 1.5 percent to anyone who scores a 3 or higher. Reasonable instinct. Sound policy direction.

A consultant arrives. The consultant delivers an elegant score-to-percentage matrix. Score of 5 gets 2 percent. Score of 4 gets 1.5 percent. Score of 3 gets 1 percent. Below 3, ineligible. Eligibility rules for new hires and leaves of absence get adjusted at the margin. The merit pool target gets set. The framework gets approved by the Board.

Year one rolls out. Departments do what they have always done. Scores come in.

Year two surfaces a pattern. One department has 30 percent of its employees scoring 5. Another department has zero scores above 4. The percentage assignments under the new framework now amplify that scoring variance into pay differentials the County cannot defend. A grievance lands. Then another. By year three, an EEOC inquiry starts asking about score distribution by protected class.

The forensic finding is always the same. The framework was sound. The ratings it sat on top of were never reliable to begin with. The County built a precision pay-differentiation policy on top of an uncalibrated rating instrument.

The instrument is what fails

Three rater patterns produce most year-two failures.

Rater compression. One supervisor scores almost every employee a 3. The supervisor reads "3" as "doing the job" and treats higher scores as exceptional. Under a flat 1.5 percent framework, this is invisible. Under a score-to-percentage framework, it underpays a whole department.

Rater inflation. The opposite supervisor scores almost every employee a 5. Under the new framework, that department gets 2 percent increases across the board. The differential is illusory; there is no actual performance differentiation occurring.

Rater drift correlated with protected class. The hardest pattern to surface and the most legally exposed. A rater scores one demographic group consistently lower than another at otherwise equivalent performance levels. Under the new framework, the score gap becomes a pay gap. Title VII applies. State pay equity statutes apply. Some states (Colorado, California, Massachusetts) impose pay equity analysis requirements beyond federal law.

None of these patterns are caused by the framework. The framework just amplifies them.

What holds up

A merit framework that holds up in year two and year three shares three traits.

The rating instrument was calibrated before the framework launched. Not after. Inter-rater reliability data was collected on prior cycles. Outlier raters and outlier departments were identified. Calibration training was delivered using a real distribution of anonymized scoring scenarios drawn from the County's own data. Raters were not asked to internalize a uniform script; they were asked to anchor on consistent thresholds.

Eligibility rules were specified and stress-tested before launch. PIP-status employees, new hires within the cycle, employees on FMLA or military leave, mid-cycle transfers and promotions. Each scenario was run against the rules in the open. Edge cases the County had not thought of surfaced and got policy decisions on the record.

The adverse impact profile of the proposed score-to-percentage mapping was analyzed before launch, not after. Multivariate regression on the baseline scoring data by sex, race, age, and disability where ascertainable. Compression diagnostic on long-tenured employees in the bottom-eligible tier. If material adverse impact appears in the baseline, remediation happens before launch. Federal Title VII frameworks apply. So do state-specific statutes; Colorado, California, Maryland, Massachusetts, and New York all have pay equity laws that operate beyond federal Title VII.

Four checks before a rollout

If your jurisdiction is contemplating a score-to-percentage merit redesign, four pre-launch checks separate frameworks that hold up from frameworks that get litigated.

Do you have inter-rater reliability data on the existing scoring instrument? If you do not, you do not yet know whether your raters agree with each other.
Has the adverse impact profile of the proposed percentage mapping been analyzed by protected class against at least two prior cycles of scoring data?
Are the eligibility rules for PIP, new-hire, leave-of-absence, and mid-cycle transfers documented and stress-tested against historical scenarios?
Have you decided in the open whether budget control happens through a fixed merit pool that forces distribution, or through individual caps and floors that preserve rater autonomy? Both have equity tradeoffs. The choice should be documented.

The answer to all four should be yes before the framework launches. If any answer is no, the framework will launch on top of an uncalibrated instrument and the failures will surface in year two.

Closing

The score-to-percentage matrix is the visible deliverable. The Board signs the matrix. The press release announces the matrix. The grievances, the EEOC inquiry, and the pay equity audit that come later focus on the instrument underneath.

Pinnacle's approach to merit framework design starts with the instrument and works up. Policy flows from a calibrated rating foundation, not the other way around.

Contemplating a merit redesign in your jurisdiction?

Pinnacle's instrument-first approach has been refined for public-sector merit systems where every percentage assignment must survive grievance, audit, and Board scrutiny. Start with a focused calibration baseline assessment.

Start a conversation

← All insights