Executive Summary: Engineering teams at lending companies face the same question every year: should we build our own Secretary of State data pipeline, or subscribe to an API that already covers all 50 states? The answer looks obvious until you start counting the real costs on both sides. This article breaks down the total cost of ownership for both approaches, including the hidden expenses that turn a "simple scraping project" into a six-figure annual commitment, and the real limitations of API providers that sales pages conveniently leave out.
What Does It Actually Cost to Build an In-House SOS Scraper?
The initial build is where most engineering leaders underestimate scope. Connecting to one state's Secretary of State website is straightforward. Connecting to all 50, with their different interfaces, data formats, anti-bot protections, and response structures, is a different project entirely.
How Much Engineering Time Does the Initial Build Require?
A realistic estimate for building a multi-state SOS scraping system starts with understanding the work involved. Each state website has a unique interface, search flow, and data structure. Some states use JavaScript-heavy single-page applications. Others require multi-step form submissions. A few still run on legacy systems with unpredictable response formats.
According to the U.S. Bureau of Labor Statistics, the median annual wage for software developers was $132,270 as of May 2024.[^1] With a 30 to 35 percent benefits and overhead multiplier (health insurance, payroll taxes, equipment, office space), the fully loaded cost for a mid-level software engineer lands at roughly $170,000 to $178,000 annually, or $82 to $86 per hour.[^2] Senior engineers with scraping and infrastructure experience command higher rates; levels.fyi reports median total compensation for senior software engineers at major tech companies ranging from $200,000 to $350,000, though lending companies typically pay closer to the lower end of that range.[^3]
For this analysis, we use a conservative fully loaded rate of $85 per hour for mid-level engineers, and $110 per hour for senior engineers leading the architecture.
Here is a conservative estimate for a 50-state build:
[TABLE-1]
Note: the per-state development time of 2 to 4 weeks is an estimate based on the complexity variation across state websites. Simple states with standard HTML forms may take less than a week. States with heavy JavaScript rendering, multi-step search flows, or aggressive anti-bot systems can take considerably longer. Your actual timeline depends on how many states you need and how complex their interfaces are.
That is 7 to 9 months of engineering time with a two-person team. During this period, those engineers are not working on your core lending platform, underwriting models, or borrower experience. The opportunity cost of pulling engineers off revenue-generating work is the single largest hidden expense in the build decision.[^4]
What Are the Ongoing Infrastructure Costs?
Once the scrapers are built, they need somewhere to run. A production-grade scraping system requires compute instances, proxy services, storage, monitoring, and alerting infrastructure.
Infrastructure cost estimates below are based on published cloud pricing and vendor rate cards:
• Cloud compute (AWS/GCP). Running headless browsers for 50 states requires multiple instances with sufficient memory for Chrome/Chromium. AWS EC2 m5.xlarge instances (4 vCPU, 16 GB) run $0.192/hour on-demand in us-east-1, or roughly $140/month per instance.[^5] A production setup with 3 to 5 instances for redundancy and parallel processing runs $420 to $700 per month for compute alone. Reserved instances reduce this by 30 to 40 percent.
• Proxy services. State websites detect and block datacenter IP addresses. Residential proxy providers like Bright Data and Oxylabs publish pricing starting at $8 to $15 per gigabyte for residential traffic.[^6] At moderate scraping volume (5,000 to 20,000 lookups per month), expect $500 to $2,000 per month depending on traffic patterns and retry rates.
• Storage and database. Storing results, screenshots, and historical records in S3 or equivalent adds $50 to $200 per month at moderate volume. Database hosting (RDS or similar) adds $100 to $300 per month.
• Monitoring and alerting. Datadog infrastructure monitoring starts at $15 per host per month; APM adds $31 per host per month.[^7] PagerDuty starts at $21 per user per month. Total monitoring and alerting: $150 to $400 per month.
[TABLE-2]
These numbers assume moderate volume (5,000 to 20,000 lookups per month). High-volume lenders processing 50,000 or more lookups per month will land at the upper end or beyond due to increased proxy consumption and compute requirements.
Why Do Maintenance Costs Exceed the Initial Build?
This is where the build-versus-buy math shifts dramatically. Industry data consistently shows that initial development accounts for less than 30 percent of total cost over a software system's lifespan.[^4] The remaining 70 percent is maintenance, and SOS scrapers are particularly maintenance-intensive because the data sources are entirely outside your control.
How Often Do State Websites Change?
State government websites undergo continuous updates. The traditional model of major redesigns every five years has shifted toward iterative improvements, meaning changes can happen at any time without advance notice.[^8] When a state changes its website layout, form structure, CAPTCHA implementation, or data format, your scraper breaks.
For a 50-state system, engineering teams should expect:
• 5 to 10 state websites changing per quarter (layout updates, new CAPTCHA systems, form field modifications)
• 2 to 3 major overhauls per year (full redesigns that require rewriting the scraper from scratch for that state)
• Intermittent outages requiring investigation (is the state down, or did our scraper break?)
Each broken scraper takes 4 to 16 hours to diagnose and fix, depending on the severity of the change. At 20 to 40 breakages per year, that is 80 to 640 engineering hours annually just to keep the system running at its current level. These numbers are estimates based on the frequency of state website changes and the typical complexity of scraper repairs. Your experience will vary based on which states you cover and how resilient your architecture is.
What Does Anti-Bot Detection Cost You?
Modern anti-bot systems have become significantly more sophisticated. Websites now analyze browser fingerprints, mouse movement patterns, IP reputation, and request timing to detect automated access.[^9] Google's reCAPTCHA v3 runs silently in the background, scoring visitors without ever showing a puzzle.
Engineering teams working on scraping infrastructure report spending a significant portion of their time working around CAPTCHAs, rate limiting, JavaScript rendering challenges, and IP blocking. For a dedicated scraping engineer, anti-bot countermeasures can consume 15 to 25 percent of their working hours, though the exact percentage depends on which states you target and how aggressively those states deploy anti-bot technology.
The cost of CAPTCHA-solving services, browser automation frameworks, and residential proxy rotation adds up separately from the engineering hours:
• CAPTCHA-solving services. Services like 2Captcha and Anti-Captcha charge $1 to $3 per 1,000 solves. At 50,000 lookups per month with a 30 percent CAPTCHA encounter rate, that is $150 to $450 per month.
• Headless browser infrastructure. Running Puppeteer or Playwright at scale requires more compute than simple HTTP requests, adding 2x to 3x to your cloud compute costs.
• Proxy rotation and management. Residential proxies with automatic rotation command premium pricing, often $10 to $15 per gigabyte of traffic.[^6]
What Is the True Total Cost of Ownership Over Three Years?
Combining initial build, infrastructure, and ongoing maintenance produces the full picture. The three-year TCO model below uses mid-range estimates. Where a number is estimated rather than sourced from a specific vendor or government dataset, it is marked accordingly.
[TABLE-3]
The three-year in-house TCO lands between $350,000 and $650,000, depending on volume, team seniority, and how many states you actually need to cover. The wide range reflects legitimate differences in scope: a team building scrapers for 15 high-priority states will spend far less than one targeting all 50.
Budget 15 to 20 percent of the initial development cost annually for maintenance. If the project costs $115,000 to build, expect $17,000 to $23,000 annually in maintenance alone, before infrastructure and engineering time.[^4]
How Does an SOS API Subscription Compare?
An API subscription replaces the entire cost structure above with a predictable monthly expense. Here is what changes when you use a service like Cobalt Intelligence instead of building in-house.
What Costs Disappear with an API?
• Reduced build time. Integration typically takes 3 to 5 days for a standard REST API integration, not 7 to 9 months.[^10]
• No maintenance engineering for scrapers. The API provider handles all 50 state website changes, anti-bot countermeasures, and data normalization.
• No scraping infrastructure. No proxy servers, no headless browser clusters, no monitoring dashboards for scraper health.
• No CAPTCHA costs. The provider absorbs all anti-bot expenses.
However, you still need to build and maintain your integration layer: error handling, retry logic, response parsing, and whatever business logic sits between the API response and your underwriting system. That is not zero engineering work, but it is days of work rather than months.
What Does an API Subscription Actually Cost?
Cobalt Intelligence uses a credit-based pricing model. SOS lookups cost 1 credit each. Based on pricing discussed in sales conversations, the tiers work roughly as follows:
[TABLE-4]
Important cost details that are easy to miss:
• Delaware pass-through fees. Delaware charges a state-imposed fee of $15 to $20 per entity status check. Cobalt passes this through at cost. If you run 500 Delaware lookups per month, that is $7,500 to $10,000 per month in state fees alone, on top of your credit costs. This is a state government fee, not a Cobalt surcharge, but it still hits your budget.
• Credits are consumed on "no result" responses. If a live lookup returns no matching entity, the credit is still consumed because "no record found" is itself a verification signal. Cached lookups that return no result are not billed.
• TIN/EIN verification costs 3 credits per lookup. If you bundle SOS with TIN verification (common for lenders), your effective per-verification cost is higher.
For a lender processing 5,000 SOS verifications per month at roughly $0.75 per lookup, the monthly API cost is approximately $3,750, plus any Delaware pass-through fees. That is significantly less than the in-house alternative. The pricing scales more predictably, though it is not perfectly linear because volume discounts require tier commitments.
What Does the Side-by-Side Comparison Look Like?
[TABLE-5]
The cost difference is significant at most volumes. But the comparison is not as clean as the table suggests, because the API side carries its own set of risks and limitations that deserve equal scrutiny.
What Are the Real Limitations of Using an SOS API?
This is the section that most vendor comparison articles skip. If you are making a real build-versus-buy decision, you need to see both options limp.
What Happens When a State Website Goes Down?
Cobalt pulls data directly from official state websites with every live request. This is the source of its freshness advantage over aggregated databases like D&B or Experian. But it also means that when a state website is down for maintenance, experiencing errors, or redesigning its interface, Cobalt cannot return live data for that state either.
In cached mode (`liveData=false`), Cobalt returns previously retrieved data with a monthly refresh cycle. This provides a fallback, but cached data could be up to 30 days stale. If a business dissolved two weeks ago, the cache would still show it as active.
The practical impact: state websites are the bottleneck, not Cobalt. No API provider can give you real-time data from a source that is offline. If your underwriting workflow requires live verification from a specific state and that state's website is down, you are waiting regardless of whether you built in-house or bought an API.
Where Are the Coverage Gaps?
Cobalt's SOS API covers all 50 states plus D.C. for entity verification. That is genuinely comprehensive for the core product. But the broader product suite has meaningful gaps:
• UCC filing data is available in 11 states only. If you need nationwide lien discovery, Cobalt covers a portion of it. You will need supplementary sources for the other 39 states.
• Court records cover 2 jurisdictions: New York State and Miami-Dade County. These are strategically relevant markets for MCA lenders, but if your portfolio is concentrated in Texas or California, this product adds limited value.
• Officer and director data availability varies by state. Some states publish full officer information; others do not. Cobalt returns whatever the state makes available, which means your data completeness is inconsistent across states.
These gaps are not unique to Cobalt. They reflect the fragmented nature of U.S. government data systems. But you need to know about them before signing a contract, because your risk team will eventually ask why UCC data is missing for businesses registered in states outside the supported 11.
What About Vendor Dependency Risk?
Subscribing to any single-vendor API creates dependency risk that engineering leaders should evaluate honestly:
• Price changes. Cobalt's pricing is not published on a public pricing page with locked-in rates. Pricing is negotiated, which means it can be renegotiated. If your verification workflow is deeply integrated with Cobalt's API, switching costs create leverage that works in the vendor's favor at renewal time.
• Terms of service changes. API providers can change rate limits, data retention policies, or feature availability. Your integration may need updates that you did not budget for.
• Business continuity. Cobalt Intelligence is a smaller, specialized provider. That comes with advantages (responsive support, willingness to customize) and risks (less financial cushion than an Experian or LexisNexis). If the company is acquired, pivots, or shuts down, your verification pipeline needs a contingency plan.
• No published SLA for uptime. Enterprise-tier customers can negotiate SLA commitments, but there is no standard published uptime guarantee. Given that state websites are the upstream dependency, any SLA would necessarily be limited by factors outside Cobalt's control.
Mitigation strategies: keep your integration layer abstracted behind an internal interface so you can swap providers. Maintain a list of alternative providers (Middesk, OpenCorporates, direct state API programs where available). Store critical verification data locally so you are not dependent on the API for historical lookups.
What About Response Time Variability?
Live SOS lookups through Cobalt take 10 to 180 seconds depending on the state. Oregon can take up to 5 minutes. This is not a Cobalt limitation per se; it reflects how long the underlying state systems take to respond. But if your underwriting workflow expects sub-second responses, you need to design around this variability.
Cached mode responses are sub-second, but at the cost of freshness. The recommended approach is a waterfall: check cache first, then fall back to live lookup only when cached data is stale or missing. This adds complexity to your integration that is easy to underestimate.
When Does Building In-House Actually Make Sense?
The honest answer is: sometimes. Here are the scenarios where the build option is defensible.
You Only Need 1 to 3 States and Have Engineering Capacity
If your lending operation is concentrated in a small number of states, the build math changes substantially. Building and maintaining scrapers for 3 states is a fundamentally different project than building for 50. The initial build drops from 7 to 9 months to 4 to 8 weeks. Maintenance burden drops proportionally. At 3 states, the annual maintenance cost might be $15,000 to $25,000, which is in the same range as an API subscription at moderate volume.
The crossover point: if you need fewer than 5 states, process fewer than 1,000 lookups per month, and have an engineer with web scraping experience who is not fully allocated to other projects, building in-house is a viable option. The API subscription may still be cheaper on paper, but the difference is small enough that other factors (control, customization, institutional knowledge) can tip the decision.
You Need Custom Data Extraction Beyond What Any API Offers
APIs return a standardized set of fields. If your underwriting process requires data points that are available on the state website but not exposed through the API (specific document types, historical filing records, annual report details), you may need custom scraping. This is especially true for specialized compliance requirements where you need to capture exact screenshots of specific pages or extract data from PDF documents attached to state filings.
Before deciding to build, ask the API provider whether they can add the fields you need. Smaller providers like Cobalt are often willing to customize for high-volume customers. But if your requirements are genuinely unique, building gives you full control over what data you extract.
Regulatory Requirements Mandate Full Data Pipeline Control
Some regulated entities are required to demonstrate complete control over their data pipeline for audit purposes. If your compliance team or regulators require that you control every step from data source to decisioning, an API provider adds a third-party dependency that may complicate your audit narrative. This is relatively rare, but it is a legitimate business requirement when it applies.
SOS Data Is Your Core Product
If your business model involves reselling or aggregating SOS data, building in-house is the right call. The API provider is your competitor in this scenario, not your vendor. This applies to a small number of companies, but it is worth stating explicitly.
What Hidden Costs Do Engineering Teams Overlook?
Beyond the line items in the TCO table, several costs consistently catch teams off guard on both sides of the decision.
How Does Compliance Risk Factor Into the Cost?
Web scraping legality varies by jurisdiction. The Ninth Circuit ruled in LinkedIn v. hiQ Labs that scraping publicly available data does not constitute unauthorized access under the Computer Fraud and Abuse Act.[^11] However, individual states may have their own restrictions.
Some states explicitly prohibit automated data collection from their systems. South Carolina Court Administration, for example, categorically bans automated data collection from its records. State unfair competition and UDAAP statutes, adopted in various forms across all 50 states, may create additional exposure.[^12]
The cost of legal review for 50-state compliance is not trivial. Outside counsel rates for technology compliance reviews run $400 to $650 per hour.[^13] A thorough assessment covering all 50 states could cost $15,000 to $30,000, with annual updates required as laws evolve. An API provider absorbs this compliance burden, though you should verify that your provider has actually done this legal work rather than assuming they have.
What Is the On-Call and Incident Response Burden?
Scrapers break at inconvenient times. A state website update at 2 AM means your verification pipeline is down until an engineer wakes up and fixes it. For lenders processing applications around the clock, this creates real business risk.
The on-call burden includes:
• Pager duty rotation for the scraping infrastructure (typically shared among 2 to 3 engineers)
• Incident response time averaging 30 to 90 minutes per breakage
• Post-incident reviews documenting what changed and how the fix was applied
• Engineer burnout from maintaining a system that breaks unpredictably
An API provider has dedicated teams monitoring all 50 states continuously, because their entire revenue depends on it. That said, when a state website breaks at 2 AM, the API provider's scraper is also broken until they fix it. The difference is that fixing it is their full-time job, not a side responsibility for your team.
What About Data Quality and Edge Cases?
State websites return data in wildly different formats. "Active" in one state is "In Good Standing" in another. "Dead" in Ohio means the same thing as "Dissolved" in California. Building a normalization layer that correctly maps status values across all 50 states is a project unto itself.
Edge cases multiply the complexity:
• Delaware charges $15 to $20 per status check (a state-imposed fee, not avoidable by building in-house; you pay this fee whether you scrape directly or use an API)
• Oregon lookups can take up to 5 minutes due to the state's system performance
• New York returns minimal status data requiring additional investigation
• California has multiple suspension types (FTB, SOS, or both) with different implications for lending decisions
Each edge case requires custom logic, testing, and documentation. An API like Cobalt Intelligence has already solved these problems and normalizes all status values into a consistent format with a confidence score for automated decisioning. Building your own normalization layer is doable, but it requires sustained attention as states change their terminology and data structures.
How Should Engineering Leaders Make This Decision?
The decision framework comes down to three questions.
Is SOS Data Extraction a Core Competency?
If your company sells SOS data or builds differentiated products on top of proprietary scraping, building in-house may be justified. For every other use case, SOS data is infrastructure, not product. Buy infrastructure. Build product.
Can Your Team Absorb 7 to 9 Months of Distraction?
The initial build pulls senior engineers away from your product roadmap. If your lending platform has features waiting to be built, customers waiting for improvements, or technical debt waiting to be addressed, the opportunity cost alone may exceed the three-year API subscription cost.
Research from McKinsey and other management consultancies consistently shows that companies achieve higher returns when engineering resources are focused on strategic digital assets aligned with core business, rather than rebuilding commodity infrastructure.[^14] SOS scraping is not strategic for lenders. It is plumbing.
What Is Your Contingency Plan for Either Option?
If you build in-house, what happens when your scraping engineer leaves? Knowledge of 50 different state website quirks does not transfer easily through documentation alone.
If you buy an API, what happens if the provider raises prices by 50 percent at renewal, or if they get acquired and the new owner changes the product roadmap? Your contingency plan should include an abstraction layer in your codebase that makes provider-switching feasible within weeks, not months.
Neither option is risk-free. The question is which set of risks you are better equipped to manage.
For engineering leaders evaluating this decision, Cobalt Intelligence offers a complete guide to SOS API solutions and a broader overview of business verification APIs for alternative lenders. Both resources provide additional technical context for the build-versus-buy evaluation.
Ready to see the API in action? Cobalt Intelligence provides test mode access so your team can evaluate the integration before committing. Request an API key at cobaltintelligence.com and run your first lookup in minutes, not months.







.png)




.png)