Why Static Secretary of State Data Feeds Fail Modern KYB Verification in 2026

April 28, 2026
April 27, 2026
12 Minutes Read
Secretary of State APIblog main image

The No-Match Ambiguity Problem

A no-match from a KYB vendor is supposed to tell you something definitive. In practice, it tells you one of two very different things: either the business truly is not registered in the state you checked, or the business is registered but has not yet propagated into your vendor's data feed.

What does data staleness actually look like inside a business registry feed?

Industry coverage in 2025 describes some KYB providers operating on datasets refreshed on monthly or even quarterly cycles.[1] Alloy, which competes in the KYB orchestration market and has a direct commercial interest in positioning data-layer competitors as lower-tier, argued in its 2026 guidance that "lower-tier KYB providers use static datasets (snapshots from months ago) to save money, creating regulatory liability."[2] The framing is self-interested but the underlying architectural point is real and independently confirmable: bulk-refresh feeds lag the source. OpenCorporates made a similar architectural point in its 2025 framework, noting that official business data "is scattered across thousands of registries with inconsistent formats or limited access."[3]

Staleness is not a single number. It is a distribution. Some records refresh quickly; others lag significantly. The lag is longest for exactly the updates that risk teams care about most: new filings, status changes, name changes, and address updates.

How big is the window between a state filing and aggregator visibility?

State registries publish filings on their own cadence. Some states push nightly updates; others update weekly or on a batch schedule. Aggregators then ingest those updates on their own cadence, normalize the records, and republish them to API consumers. Each hop adds lag. The cumulative window between a real-world filing and a fresh aggregator record routinely exceeds a week, and in certain states it can exceed a month.

Only 48.8 percent of the world's corporate registers offer publicly available APIs, and many that do make the APIs difficult to access at production volume.[4] That structural constraint is why aggregators build their business on bulk ingest in the first place. The same constraint is why their data is stale.

Which decisions rely on data that isn't actually current?

Underwriting decisions that depend on "time in business" as a proxy for risk, compliance decisions that depend on current entity status, and onboarding decisions that depend on whether a business exists at all are all vulnerable to the lag. The more volume you process, the more mis-decisions compound. Industry-observed false no-match rates on newly-formed entities from static feeds range from roughly 2 to 5 percent depending on vendor and state mix; the rest of this post uses 3 percent as an illustrative midpoint and flags where that number is doing analytical work. That flag is important: any ROI argument built on an unsourced rate collapses under the first examiner question.

How Static Secretary of State Feeds Are Actually Built

Understanding the failure mode requires understanding the architecture. Not all business verification vendors work the same way, and the distinction matters more than buyers typically realize.

What is a static SoS data feed, and how does it differ from live-ping?

Static SoS feeds are built through bulk ingest. A vendor purchases or scrapes periodic data extracts from state registries, normalizes them into a common schema, loads them into its own database, and exposes them via an API. When your system calls that API, it queries the vendor's database, not the state registry. The response is as fresh as the vendor's last refresh, whenever that was.

This architecture has real advantages: queries return in milliseconds, costs scale well, and vendors can enrich records with graph data and additional fields. It has one decisive disadvantage: the data is never live.

The alternative architecture is real-time lookup, sometimes called live ping. Live ping providers query the state registry at the moment you call the API and return whatever the state currently shows. Kyckr, Cobalt Intelligence, Baselayer, and a small number of other vendors operate this way.[5] Middesk, many Alloy-default data integrations, legacy Dun and Bradstreet, and the majority of established KYB databases operate on the bulk-refresh model.

Why do bulk-refresh cadences create a structural lag?

Bulk refresh is expensive. A vendor covering all 50 states has to parse dozens of different file formats, reconcile entity IDs across state-specific schemas, and handle edge cases where states revise historical records. Doing this weekly is hard. Doing it daily is harder. Doing it continuously, per-record, at the moment of each customer query, is a different architecture entirely.

What "daily update" actually means in vendor SLA language

A vendor advertising daily updates is describing its ingest cadence, not its propagation cadence. The state publishes on its cadence. The aggregator ingests on its cadence. The aggregator processes, normalizes, and publishes on its cadence. Every layer adds latency. If any layer misses a cycle, the latency doubles.

The honest way to read vendor SLA language is to ask about propagation, not refresh: when a business files a formation document today, how many days until your customers see it in an API response? Most vendors cannot answer this question with a single number. That ambiguity is the point.

The Major Business Verification Approaches, Compared Honestly

Here is how the architecturally distinct approaches stack up for alternative lenders and lending infrastructure platforms, with response-time trade-offs surfaced alongside freshness.

ApproachData FreshnessTypical Response TimeBest Use CaseHonest Trade-Off
Static aggregator (Middesk, legacy Dun and Bradstreet)Days to weeks behind stateSub-secondPre-screening high volume at low costNew filings, name changes, recent status changes all lag
Hybrid cache with periodic refresh (OpenCorporates, many Alloy default integrations)Varies by jurisdiction, often daysSub-second on cache, minutes on refreshModerate-volume verification with global coverageCoverage-quality trade-off; API rate limits at scale
Real-time registry lookup (Cobalt Intelligence, Kyckr, Baselayer)Live at query time10-30s most states; Delaware 15-30s; Oregon can take up to 5 minutesFinal verification, compliance-sensitive decisions, newly-formed entitiesResponse-time penalty is material; higher per-call cost. Async callback architecture required for the slow-state tail.
Data layer behind KYB platforms (Cobalt Intelligence, white-label data sources)Live, embedded inside platform workflowsSame live-ping latency, surfaced via integrator's UXPlatforms embedding real-time registry data inside an identity or onboarding productRequires architectural investment by the integrating platform (async webhooks, retry logic)

Cobalt Intelligence sits in a distinct row because Cobalt is a data layer, not a KYB platform. Many of the vendors above can consume Cobalt or similar upstream live-ping data as input. The honest framing for most lenders: KYB platforms handle identity, adverse media, and workflow; the registry data layer underneath determines whether the whole stack is looking at fresh or stale truth. The response-time penalty on live-ping is the defensible trade-off you accept for certainty, and it is the reason mature shops run it as a fallback inside a cache-first waterfall (covered below) rather than as a blanket primary.

Where Data Lag Does the Most Damage

The pain is not distributed evenly. Certain entity types and lifecycle events concentrate the false-no-match risk. For merchant cash advance (MCA) providers using time-in-business as a proxy for repayment risk, a false no-match on a newly-formed LLC is not just an operational cost; it is a mispriced deal.

Why newly-formed LLCs are the most frequent false no-match

LLCs represent roughly 85 percent of new entity formations in the United States, and Wolters Kluwer's 2025 analysis across 47 states counted nearly 5.5 million new business formations that year.[6] Every new LLC goes through a propagation window before it exists in any aggregator's database. If you are onboarding small and mid-sized businesses, a meaningful share of your applicant pool is invisible to static-feed vendors on any given day.

What happens when a business changes its name or address

Name changes, address changes, and registered-agent updates are the other high-lag events. Static feeds generally do not receive these updates at the same cadence as new formations, because states themselves do not always include them in the bulk files they push to aggregators. Delaware, for example, does not push business name change updates into standard data feeds at all; the only way to detect a Delaware name change is to look at the registry directly.

A business that has legitimately changed its name and is applying for credit under the new name will return a no-match from a static feed that only has the old name. The underwriter either declines a good deal or escalates to manual review. Either way, the vendor's staleness became an operational cost.

Which states are worst at propagating changes into feeds

Delaware is well known inside the industry for the name-change gap. Other states create their own propagation quirks. Oregon is slow across the board. California has high volume and fast formations but inconsistent status-string normalization. New Jersey restricts some status data by statute. Texas is fast and reliable but has an unusual number of entity subtypes. A real-time lookup treats all of these as a single query; a static feed bakes each state's quirks into the refresh pipeline, with compounding consequences.

Alternative lenders who disproportionately serve industries with high name-change activity, franchise operations, contractors, or multi-state fleets, feel the lag more than others. Insurance carriers and regional banks underwriting long-duration policies feel it less in the moment but pay for it later when audit trails do not reconcile.

What Real-Time Registry Lookup Actually Changes

A real-time business registry lookup does not just improve freshness. It changes the semantics of the verification result.

True negative vs feed-driven no-match

A true negative is a no-match you can act on. When a real-time lookup queries the state registry and returns no result, you can assert with high confidence that the business is not registered in that state at the moment you asked. A feed-driven no-match is ambiguous: the business might not be registered, or it might be registered but not yet in the feed.

This is not a theoretical distinction. Compliance officers face it every audit cycle. If an examiner asks "why did you decline this business?" and the answer is "our vendor returned no match," the next question is always "how do you know that was not a data-freshness issue?" Real-time lookup makes that question trivially answerable. Static feeds make it awkward.

Live-lookup architecture, honestly disclosed

Live-lookup architecture reaches out to the state website or the state's official API at the moment of each query. The response reflects whatever the state actually shows at that moment. There is no intermediate cache, no propagation window, no refresh cadence.

The cost is response time. Most states return in 10 to 30 seconds. Delaware takes 15 to 30 seconds. Oregon can take up to 5 minutes on live lookups. That is slow by API standards but fast relative to manual verification. The trade-off is simple: if you can tolerate seconds to minutes of response time on a small fraction of your verifications, you can replace ambiguity with certainty on the cases that matter most. Modern live-ping providers expose async callback architectures (the integrator sets up a webhook, receives the response when the state returns) precisely so this latency does not block the end-user experience in a synchronous onboarding flow.

Timestamped verification as audit defense

Real-time providers can return a timestamped screenshot of the state registry page at the moment of the query. That artifact, stored alongside the application record, becomes a defensible audit trail. It shows the examiner exactly what the registry said at the moment of decision, independent of what the vendor's database says today. Alt-lender platforms that have moved to real-time lookup explicitly cite audit defense as a primary driver.

How to Re-Think Fallback Logic When the Feed Is Stale

The practical architecture for most shops is not pure live-ping or pure cache. It is a waterfall.

The cache-first, real-time-fallback waterfall

The common pattern:

1. Call the cached data source first for sub-second response.

2. If the cached source returns a match and the record is current by your definition (formation date, last-updated field, match confidence), accept and move on.

3. If the cached source returns no match, returns a stale record, or returns a low-confidence match, trigger a real-time lookup against the state registry.

4. Treat the real-time response as authoritative, log both the cached and live responses for auditability, and route the decision accordingly.

This pattern captures the cost advantage of caching for the 80 percent of queries that resolve cleanly and the confidence advantage of real-time for the 20 percent that matter most. High-volume lenders like Bitty Advance and B2B platforms like Bectran run verification this way in production, with async architecture handling the slow-state tail rather than blocking the onboarding UX.

When a live lookup should trigger automatically

Trigger live lookup automatically when: the business was formed in the last 12 months, the applicant reports a recent name or address change, the cached record is older than your defined freshness threshold, the deal value is above a threshold where manual review would otherwise be triggered, or the application is in a state known to lag (Delaware name changes, Oregon status updates, California high-volume).

Live lookups should be opt-in for: bulk pre-screening of long-tail applicants, portfolio-wide re-verification sweeps where speed matters more than per-record cost, and any workflow where sub-second response is a hard requirement.

Exception queues only work on reliable no-match signals

An exception queue only works if the no-match signal is reliable. If your queue is full of ambiguous no-matches from a stale feed, your manual reviewers end up doing the state-registry lookup themselves, by hand. That is the same manual process the vendor was supposed to eliminate. The queue-first design is: real-time lookup first, escalate to human only when the live response is still ambiguous (multiple candidate matches, address conflicts, officer name mismatches).

The Vendor Freshness Audit: Seven Questions Procurement Should Be Asking

Before the next contract renewal, every risk team should push these questions through procurement.

What is your data refresh cadence per state, not globally? Vendors who cannot give you a per-state number are hiding variance.

How do you detect and propagate business name changes in Delaware? If the honest answer is "we do not," you have just identified a known gap.

When a business files a new LLC in Texas today, how many days until it appears in your API response? Specific-case questions force specific answers.

Can you produce a timestamped verification artifact from the state source, or only from your own database? Audit defense depends on the former.

What is your live-ping option, what is the per-state response-time distribution, and what does it cost per call? Most vendors who operate on bulk refresh do not offer one at all.

Do you normalize status strings across all 50 states, and what is your coverage for entity subtypes? "Active" in one state is "Good Standing" in another; unnormalized data creates silent underwriting drift.

Can I see a reference customer at my volume tier who operates in my industry? Vague references signal a vendor pitching out of segment.

Vendors who answer these questions crisply will win the account. Vendors who deflect, hedge, or pivot to unrelated features are telling you something about their data that you would rather learn before you sign.

The Real Cost of Not Fixing the Data Lag Problem

The business case for real-time lookup is not labor math. It is adverse selection and revenue impact.

Consider a lender running 10,000 business verifications per month. At the 3 percent illustrative false-no-match rate discussed earlier, that is 300 legitimate applications per month pushed into manual review or declined outright. Run your own approval rate and average advance size against that number and you get the real picture: at a typical MCA book where roughly half of those would have funded at an average $40,000 advance, you are looking at 150 deals per month, $6 million in annual originations, and whatever revenue that represents in your pricing model mis-routed to competitors whose vendors happened to catch the filing first. Labor hours saved are a rounding error next to that pipeline impact.

The qualitative evidence is consistent with the model. Bectran, a B2B credit platform, described the pre-real-time state bluntly in a public press release: "very outdated and most of the time just straight up wrong data. Reports from our customers looked very bad on us."[7] Joe Salvatore, Chief Risk Officer at Idea Financial, processing between 5,000 and 10,000 applications monthly, has described the pre-integration state publicly as "completely manual still ... the Achilles heel."[8]

The compliance consequences are just as concrete. FinCEN's March 2025 interim final rule revising beneficial ownership reporting requires foreign reporting companies to correct inaccurate information within 30 days of becoming aware of it.[9] A verification system that cannot surface recent changes inside a 30-day window is creating a compliance clock that will eventually be missed.

For a deeper walkthrough of the verification architecture and a side-by-side look at how the SOS API fits into a full lending workflow, see our Business Verification APIs for Lenders guide and our 50-state business entity verification hub.