ERPC Significantly Improves Solana Network Infrastructure. Rust High-Performance Proxy Platform Fully Upgraded, Deployed Across All Regions for Shared RPC, gRPC, and Shredstream. Zero-Downtime Updates Achieved

ERPC, operated by ELSOUL LABO B.V. (Headquarters: Amsterdam, the Netherlands; CEO: Fumitake Kawasaki) and Validators DAO, has completed a major upgrade to its Solana network infrastructure.

This upgrade has already been applied to all regions and all shared endpoints provided by ERPC (Solana RPC, Geyser gRPC, and Shredstream). We updated the infrastructure behaviors that tend to directly influence real-world outcomes as an integrated system, including connection initiation, TLS processing, cache control, HTTP/1.1 and HTTP/2 transport, long-lived connection behavior, and metrics for observability and troubleshooting.

While maintaining day-to-day responsiveness as a baseline, we also reorganized the underlying network behavior so it is less likely to become biased or unstable in scenarios where results tend to degrade—such as peak-load volatility, instability under sustained operation, and cascades triggered by disconnects and reconnects. As a result, the environment is now better structured to sustain both performance and stability in practical Solana operations.

In addition, we have transitioned to an operational architecture that allows network configuration changes and platform upgrades to be applied with complete zero downtime. There are no changes to pricing, specifications, authentication, or rate limits, and existing ERPC customers receive the benefits of the upgrade without any additional setup or operational changes.

Background

In practical Solana operations, average response time and normal-time latency are critical baseline requirements. At the same time, there are scenarios in which the behavior of the underlying network infrastructure itself determines outcomes—such as moments of concentrated load, long-lived connections, and phases in which disconnects and reconnects occur.

Shared endpoints in particular must accommodate both bursts of transaction submission within short time windows and always-on connections via WebSocket and gRPC. Under these conditions, infrastructure-level behavior—connection initiation, TLS handshakes, transport behavior, cache handling, and recovery from idle states—directly reflects into user experience and execution outcomes.

With average responsiveness as an explicit baseline, real-world results can still be decided by different factors during spikes or under sustained operation. Therefore, practical operations require that day-to-day usability and continuity in failure-prone scenarios are both achieved at the same time.

ERPC has designed and operated its own Rust high-performance proxy platform as the foundation for Solana communications, maintaining an architecture that applies the same approach across all regions while continuously evolving the platform. This upgrade re-examines operationally observed issues as a unified system—from connection initiation through long-lived operation—and reorganizes the entire network foundation accordingly.

What Changes for ERPC Customers

With this update, ERPC customers will first see stabilized behavior at connection initiation. During connection establishment including TLS, mismatched conditions and unnecessary retries are less likely to occur, making it easier for transactions and streams to enter processing reliably at the start.

Next, we reorganized the infrastructure behaviors that tend to cause volatility during peak load. By combining early filtering of unnecessary connections with simultaneous updates to HTTP/1.1 and HTTP/2 transport and timeout consistency, connection pool health, cache behavior under contention, and metrics for observability and troubleshooting, we have strengthened conditions that help prevent biased behavior even when load concentrates.

For long-lived WebSocket and gRPC streams and always-on monitoring workloads, connection continuity has improved. The frequency of disconnect/reconnect/resync events—and the likelihood of those events cascading into outcomes—has been reduced, making it easier to build operations on the assumption of sustained runtime.

Improvements to cache control and transport behavior also reduce the likelihood of unnecessary refetches and wasted processing during congestion. Bandwidth and processing headroom are more likely to remain usable and stable, and expanded metrics and observability make root-cause identification and recovery timelines easier to shorten.

In addition, by enabling configuration changes and platform upgrades with zero downtime, we have established operational conditions that make it easier to raise performance, stability, and overall platform quality at high frequency. The ability to keep improving without pausing the platform further strengthens continuity for customers.

Details of the Improvements

This upgrade is not presented as a release driven by specific feature names or version numbers. Instead, it decomposes the scenarios that tend to dominate real-world Solana outcomes into the following layers—connection initiation, TLS, the L4/HTTP boundary, H1/H2 transport, cache, observability, failure behavior, and long-term operational prerequisites—and updates the platform so these layers connect without contradiction.

Below, we explain the incorporated improvements in terms of how they contribute to customer experience and operational outcomes.

Improvements to Connection Initiation and TLS Handling

We expanded the TLS context handled during connection establishment and updated the structure so required state can be retained and applied appropriately. This makes mismatched conditions and unnecessary retries less likely at connection initiation.

We also reorganized TLS handling—including certificate verification and hostname verification—so security requirements can be met while reducing conditions where handshake failures or handling inconsistencies create initiation losses that cascade into outcomes. This is not merely a security enhancement; it contributes to stabilizing behavior from connection start through entry into processing for Solana workloads.

We further strengthened mechanisms that make TLS-adjacent behavior easier to observe and troubleshoot. In scenarios where initiation dominates outcomes, the ability to reproduce issues, identify causes, and reflect fixes quickly becomes the capability that preserves experience quality.

Preserving Headroom via Early Filtering of Unnecessary Connections

We introduced a mechanism to filter TCP connections at an early stage, updating the platform so illegitimate or unnecessary connections are less likely to pressure legitimate traffic. In shared endpoints, connection requests can spike due to external factors or temporary skews.

Early-stage filtering helps ensure legitimate connections are less likely to stall at initiation, improving the likelihood that headroom remains available during peak load. As a result, behavior is less likely to become biased even in concentrated-load scenarios, and conditions for a stable latency distribution are strengthened.

Clarifying the Connection Model by Reorganizing the L4/HTTP Boundary

Network infrastructure does not end at HTTP. Connection establishment and continuity depend on L4 conditions, and volatility at that layer propagates into higher-level protocol experience.

In this update, we abstracted L4 stream handling and reorganized the structure so the connection model can be handled more explicitly. This makes it easier for the platform to sustain consistent behavior across scenarios where connections continue to grow, client implementations vary, and long-lived operation causes state transitions.

Retry behavior was also reorganized to reduce patterns in which short-lived volatility cascades into user experience. Practical stability depends less on eliminating isolated failures and more on preventing failure cascades.

Improvements to HTTP/1.1 and HTTP/2 Transport and Long-Run Behavior

We added measurements that allow transferred data volume to be tracked consistently across HTTP/1.1 and HTTP/2. This makes it easier to identify where stalls or bottlenecks occur in the transport pipeline, improving both troubleshooting and the speed at which fixes can be applied.

We also reorganized HTTP/2 body-write timeout behavior so unnatural stalls and hangs are less likely during concentrated load or long-lived streaming. In long-run operation, what matters is not peak performance in ideal states, but the ability to prevent behavior from collapsing during state transitions.

Idle timeout behavior and connection pool handling have also been reviewed, removing instability factors that tend to accumulate during sustained runtime. On the HTTP/1.1 side, we reorganized safe shutdown behavior for connections that hold incomplete requests, reducing sources of volatility in both resource usage and behavior.

Improvements to Cache Control and Operational Quality

We improved the ability to track why an asset is not cached, increasing the explainability of cache behavior. In practice, what dominates is not whether caching exists, but under what conditions it is applied and under what conditions it falls out.

We reorganized lock behavior, stale handling, and revalidation patterns so experience degradation is less likely to cascade when contention occurs under peak load. We also organized eviction controls for cases where the number of cached assets grows, and refined partial-content behaviors (including Range requests), strengthening conditions that reduce unnecessary refetches and latency under real-world workloads.

These improvements reduce cases in which cache behavior becomes an outlier, making it less likely that customers must design operations around infrastructure-level uncertainty.

Improvements to Failure Behavior, Logging, and Observability

Failure behavior and logging have been reorganized so it is easier to understand what happened when issues occur. Patterns in which downstream errors cascade into cache/transport behavior and worsen experience are reduced, making it easier to localize blast radius.

Observability and troubleshooting improvements are not intended to claim “zero incidents,” but to shorten time-to-recovery when incidents occur. This reduces risk in peak-load and sustained-operation scenarios.

Dependency Updates and Security Fixes as Long-Term Operating Prerequisites

We incorporated dependency updates and security fixes to maintain the prerequisites for long-term platform operation. This includes updates related to the minimum supported Rust version (MSRV) and CI alignment, strengthening the foundation required to continuously evolve the platform.

The ability to keep updating safely is itself a requirement for long-term quality.

Transition to Zero-Downtime Operations

Previously, short downtime could occur during network configuration changes or platform upgrades. With this update, we have transitioned to an architecture where these operations can be applied with complete zero downtime.

Shared endpoints have always-on connections and continuous moments where timing matters. Even brief downtime can trigger disconnects, reconnects, and resync cascades, and that cost can propagate into outcomes. Zero-downtime updates reduce the likelihood of these cascades and prevent long-lived operations from being fragmented.

At the same time, ERPC now has operational conditions that allow observed issues to be reflected into improvements quickly. Higher iteration frequency enables us to continuously eliminate volatility and edge-case behavior within production operations.

Impact by Service

Solana RPC (HTTP / WebSocket)

Improvements to connection initiation, TLS, cache control, and transport behavior affect both data reads and transaction submission. While maintaining day-to-day usability, factors that bias outcomes during peak load are reduced, and conditions for preserving headroom during congestion are strengthened.

Geyser gRPC

Connection continuity has improved for long-lived streaming use. HTTP/2 transport, timeout consistency, connection pool health, and expanded transport measurements work together to reduce the likelihood that reconnect/resync costs propagate into outcomes.

Shredstream (Direct Shreds)

With connection management and initiation improvements designed for continuous delivery, conditions are strengthened so missing data or latency is less likely under congestion. Stable continuity for detection and following becomes easier to sustain.

Connecting R&D and Production Operations

The distributed systems foundation that includes ERPC has been recognized as an R&D project under the Dutch government’s WBSO program. A structure is established in which operationally observed issues can be incorporated as research subjects and improved through verification and iteration.

This network foundation update is one such iteration applied across all regions, reflected into practical performance and stability. Keeping operations and R&D connected is a prerequisite for continuously connecting what is observed in production to the next update, rather than stopping at one-off improvements.

Within ERPC, actual usage patterns, load variability, and failure-mode behavior are incorporated into repeated verification and improvement cycles that progressively raise the quality of the network foundation. This update was executed within that integrated framework of R&D and production operations.

Information for Customers

This update has already been applied to all regions and all shared endpoints. Existing ERPC customers do not need to change configuration or operations. There are no changes to pricing, specifications, authentication, or rate limits.

Because shared endpoints must sustain both short spikes and long-lived connections simultaneously, conditions have been reorganized so behavior is less likely to become biased under those mixed workloads. Even when configuration changes or platform updates occur during operations, the changes are applied with zero downtime, so customers do not need to plan for connection fragmentation or resync-by-design.

For questions about architecture, workload-specific optimization, or operational feedback, please reach out via the Validators DAO official Discord.

By continuously connecting production observations and feedback into improvements, ERPC has progressively raised its foundation quality. We will continue to accumulate improvements with zero downtime and provide network infrastructure that sustains real-world Solana outcomes.

Validators DAO Official Discord: https://discord.gg/C7ZQSrCkYR
ERPC Official Site: https://erpc.global/en