Production Scaling Guide
This guide covers strategies and best practices for scaling the BookStore application in production environments, including application-level scaling, database optimization, and performance tuning.
Overview
The BookStore application is designed for horizontal scalability with:
- Stateless Services - API and Web services can scale independently
- Event-Sourced Architecture - Marten with PostgreSQL for reliable event storage
- Read Model Projections - Optimized for read-heavy workloads
- Real-time Updates - SignalR for live notifications
Scaling Strategy Overview
graph TB
subgraph "Application Layer"
API1[API Service Instance 1]
API2[API Service Instance 2]
APIN[API Service Instance N]
WEB1[Web Frontend Instance 1]
WEB2[Web Frontend Instance 2]
end
subgraph "Load Balancing"
LB[Load Balancer]
APILB[API Load Balancer]
end
subgraph "Database Layer"
PRIMARY[PostgreSQL Primary<br/>Writes + Reads]
REPLICA1[Read Replica 1]
REPLICA2[Read Replica 2]
POOLER[Connection Pooler<br/>PgBouncer/Pgpool-II]
end
subgraph "Storage Layer"
BLOB[Azure Blob Storage]
end
LB --> WEB1
LB --> WEB2
WEB1 --> APILB
WEB2 --> APILB
APILB --> API1
APILB --> API2
APILB --> APIN
API1 --> POOLER
API2 --> POOLER
APIN --> POOLER
POOLER --> PRIMARY
POOLER --> REPLICA1
POOLER --> REPLICA2
PRIMARY -.Replication.-> REPLICA1
PRIMARY -.Replication.-> REPLICA2
API1 --> BLOB
API2 --> BLOB
APIN --> BLOB
Application Scaling
Azure Container Apps Scaling
Azure Container Apps uses KEDA (Kubernetes Event-Driven Autoscaling) for robust autoscaling capabilities.
Configure Autoscaling Rules
Create or update your Container App scaling configuration:
# Scale API service based on HTTP requests
az containerapp update \
--name apiservice \
--resource-group bookstore-rg \
--min-replicas 2 \
--max-replicas 10 \
--scale-rule-name http-scaling \
--scale-rule-type http \
--scale-rule-http-concurrency 50
# Scale based on CPU utilization
az containerapp update \
--name apiservice \
--resource-group bookstore-rg \
--scale-rule-name cpu-scaling \
--scale-rule-type cpu \
--scale-rule-metadata "type=Utilization" "value=70"
Scaling Configuration via YAML
# api-scaling.yaml
properties:
configuration:
activeRevisionsMode: Single
template:
scale:
minReplicas: 2
maxReplicas: 10
rules:
- name: http-rule
http:
metadata:
concurrentRequests: "50"
- name: cpu-rule
custom:
type: cpu
metadata:
type: Utilization
value: "70"
Best Practices for Azure Container Apps
- Set Minimum Replicas ≥ 2 for production APIs to avoid cold starts
- Configure Appropriate Concurrency - Start with 50 concurrent requests per instance
- Use Multiple Scaling Rules - Combine HTTP, CPU, and memory triggers
- Monitor Scaling Events - Use Azure Monitor to track scaling decisions
- Optimize Container Images - Use minimal base images for faster startup
# View scaling events
az monitor activity-log list \
--resource-group bookstore-rg \
--resource-id /subscriptions/<sub-id>/resourceGroups/bookstore-rg/providers/Microsoft.App/containerApps/apiservice \
--start-time 2025-01-01T00:00:00Z
Kubernetes Horizontal Pod Autoscaler (HPA)
For Kubernetes deployments, use HPA to automatically scale pods based on resource metrics.
Install Metrics Server
# Required for HPA to function
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
Configure HPA for API Service
# api-hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: apiservice-hpa
namespace: bookstore
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: apiservice
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 50
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 30
- type: Pods
value: 2
periodSeconds: 30
selectPolicy: Max
Apply the HPA:
kubectl apply -f api-hpa.yaml
# Verify HPA status
kubectl get hpa -n bookstore
kubectl describe hpa apiservice-hpa -n bookstore
Configure HPA for Web Frontend
# web-hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: webfrontend-hpa
namespace: bookstore
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: webfrontend
minReplicas: 2
maxReplicas: 8
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
behavior:
scaleDown:
stabilizationWindowSeconds: 300
Custom Metrics with Prometheus
For more advanced scaling, use custom metrics from Prometheus:
# custom-metrics-hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: apiservice-custom-hpa
namespace: bookstore
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: apiservice
minReplicas: 2
maxReplicas: 15
metrics:
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "1000"
- type: Pods
pods:
metric:
name: api_response_time_p95
target:
type: AverageValue
averageValue: "500m" # 500ms
Best Practices for Kubernetes HPA
- Define Accurate Resource Requests/Limits in deployment manifests
- Set Stabilization Windows to prevent flapping (rapid scaling up/down)
- Use Multiple Metrics for more intelligent scaling decisions
- Combine with Cluster Autoscaler to scale nodes when needed
- Monitor HPA Behavior and adjust thresholds based on actual usage
# Monitor HPA in real-time
kubectl get hpa -n bookstore --watch
# View HPA events
kubectl describe hpa apiservice-hpa -n bookstore | grep Events -A 20
Resource Requests and Limits
Define appropriate resource requests and limits for optimal scheduling and scaling:
# api-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: apiservice
namespace: bookstore
spec:
replicas: 2
template:
spec:
containers:
- name: apiservice
image: bookstoreacr.azurecr.io/bookstore-api:latest
resources:
requests:
cpu: 500m
memory: 512Mi
limits:
cpu: 1000m
memory: 1Gi
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
Database Scaling
PostgreSQL Read Replicas
For read-heavy workloads, implement read replicas to offload read queries from the primary database.
Azure Database for PostgreSQL
# Create read replica
az postgres flexible-server replica create \
--resource-group bookstore-rg \
--name bookstore-db-replica-1 \
--source-server bookstore-db \
--location eastus2
# Create additional replicas for different regions
az postgres flexible-server replica create \
--resource-group bookstore-rg \
--name bookstore-db-replica-westus \
--source-server bookstore-db \
--location westus2
Configure Application for Read Replicas
Update your connection strings to use read replicas for read operations:
// In Program.cs or configuration
services.AddMarten(options =>
{
// Primary connection for writes
options.Connection(builder.Configuration.GetConnectionString("bookstore"));
// Read replicas for queries
options.ReadReplicas(replicas =>
{
replicas.Add(builder.Configuration.GetConnectionString("bookstore-replica-1"));
replicas.Add(builder.Configuration.GetConnectionString("bookstore-replica-2"));
});
// Use round-robin load balancing for read replicas
options.UseReadReplicaLoadBalancing(LoadBalancingMode.RoundRobin);
});
Important
Marten supports read replicas natively. Configure read replicas in your Marten setup to automatically route read queries to replicas while keeping writes on the primary.
Monitor Replication Lag
-- Check replication lag on replica
SELECT
CASE
WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn() THEN 0
ELSE EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp()))
END AS replication_lag_seconds;
-- View replication status
SELECT * FROM pg_stat_replication;
Connection Pooling
Implement connection pooling to efficiently manage database connections and improve performance.
PgBouncer Configuration
Deploy PgBouncer as a connection pooler:
# pgbouncer.ini
[databases]
bookstore = host=bookstore-db.postgres.database.azure.com port=5432 dbname=bookstore
bookstore-replica-1 = host=bookstore-db-replica-1.postgres.database.azure.com port=5432 dbname=bookstore
[pgbouncer]
listen_addr = *
listen_port = 6432
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 25
reserve_pool_size = 5
reserve_pool_timeout = 3
max_db_connections = 100
server_idle_timeout = 600
server_lifetime = 3600
Kubernetes Deployment for PgBouncer
# pgbouncer-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: pgbouncer
namespace: bookstore
spec:
replicas: 2
selector:
matchLabels:
app: pgbouncer
template:
metadata:
labels:
app: pgbouncer
spec:
containers:
- name: pgbouncer
image: pgbouncer/pgbouncer:latest
ports:
- containerPort: 6432
volumeMounts:
- name: config
mountPath: /etc/pgbouncer
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 256Mi
volumes:
- name: config
configMap:
name: pgbouncer-config
---
apiVersion: v1
kind: Service
metadata:
name: pgbouncer
namespace: bookstore
spec:
selector:
app: pgbouncer
ports:
- port: 6432
targetPort: 6432
type: ClusterIP
Update application connection strings to use PgBouncer:
# Instead of connecting directly to PostgreSQL
Host=bookstore-db.postgres.database.azure.com;Port=5432;...
# Connect through PgBouncer
Host=pgbouncer;Port=6432;...
Database Performance Tuning
PostgreSQL Configuration
Optimize PostgreSQL settings for production workloads:
-- Increase shared buffers (25% of available RAM)
ALTER SYSTEM SET shared_buffers = '4GB';
-- Increase effective cache size (50-75% of available RAM)
ALTER SYSTEM SET effective_cache_size = '12GB';
-- Optimize for write-heavy workloads
ALTER SYSTEM SET wal_buffers = '16MB';
ALTER SYSTEM SET checkpoint_completion_target = 0.9;
ALTER SYSTEM SET max_wal_size = '4GB';
-- Optimize query planner
ALTER SYSTEM SET random_page_cost = 1.1; -- For SSD storage
ALTER SYSTEM SET effective_io_concurrency = 200;
-- Increase connection limits
ALTER SYSTEM SET max_connections = 200;
-- Reload configuration
SELECT pg_reload_conf();
Index Optimization
Ensure critical queries have appropriate indexes:
-- Analyze query performance
EXPLAIN ANALYZE
SELECT * FROM mt_doc_booksearchprojection
WHERE search_text @@ to_tsquery('english', 'architecture');
-- Create indexes for common queries
CREATE INDEX CONCURRENTLY idx_book_search_gin
ON mt_doc_booksearchprojection
USING gin(search_text gin_trgm_ops);
-- Index for filtering by category
CREATE INDEX CONCURRENTLY idx_book_category
ON mt_doc_booksearchprojection (category_id);
-- Composite index for common query patterns
CREATE INDEX CONCURRENTLY idx_book_category_published
ON mt_doc_booksearchprojection (category_id, publication_date DESC);
-- Analyze tables after creating indexes
ANALYZE mt_doc_booksearchprojection;
Vacuum and Maintenance
-- Configure autovacuum for event-sourced tables
ALTER TABLE mt_events SET (
autovacuum_vacuum_scale_factor = 0.05,
autovacuum_analyze_scale_factor = 0.02
);
-- Manual vacuum for large tables
VACUUM ANALYZE mt_events;
VACUUM ANALYZE mt_doc_booksearchprojection;
-- Monitor table bloat
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size,
n_live_tup,
n_dead_tup,
round(n_dead_tup * 100.0 / NULLIF(n_live_tup + n_dead_tup, 0), 2) AS dead_ratio
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY n_dead_tup DESC;
Caching Strategies
Application-Level Caching
Implement caching for frequently accessed data:
// Add distributed caching
services.AddStackExchangeRedisCache(options =>
{
options.Configuration = builder.Configuration.GetConnectionString("redis");
options.InstanceName = "BookStore:";
});
// Cache category translations
public class CachedCategoryService
{
private readonly IDistributedCache _cache;
private readonly ICategoryRepository _repository;
public async Task<IEnumerable<CategoryDto>> GetCategoriesAsync(
string language,
CancellationToken ct)
{
var cacheKey = $"categories:{language}";
var cached = await _cache.GetStringAsync(cacheKey, ct);
if (cached != null)
return JsonSerializer.Deserialize<IEnumerable<CategoryDto>>(cached);
var categories = await _repository.GetAllAsync(language, ct);
await _cache.SetStringAsync(
cacheKey,
JsonSerializer.Serialize(categories),
new DistributedCacheEntryOptions
{
AbsoluteExpirationRelativeToNow = TimeSpan.FromMinutes(15)
},
ct);
return categories;
}
}
HTTP Caching with ETags
The BookStore application already implements ETags for optimistic concurrency. Leverage this for HTTP caching:
// In API endpoints
app.MapGet("/api/books/{id}", async (
Guid id,
IDocumentSession session,
HttpContext context) =>
{
var book = await session.LoadAsync<BookSearchProjection>(id);
if (book == null) return Results.NotFound();
var etag = $"\"{book.Version}\"";
// Check If-None-Match header
if (context.Request.Headers.IfNoneMatch == etag)
return Results.StatusCode(304); // Not Modified
context.Response.Headers.ETag = etag;
context.Response.Headers.CacheControl = "private, max-age=60";
return Results.Ok(book);
});
CDN for Static Assets
Use Azure CDN or Cloudflare for static assets:
# Create Azure CDN profile
az cdn profile create \
--resource-group bookstore-rg \
--name bookstore-cdn \
--sku Standard_Microsoft
# Create CDN endpoint
az cdn endpoint create \
--resource-group bookstore-rg \
--profile-name bookstore-cdn \
--name bookstore-static \
--origin bookstore-storage.blob.core.windows.net \
--origin-host-header bookstore-storage.blob.core.windows.net
SignalR Scaling
For real-time notifications across multiple instances, use Azure SignalR Service or Redis backplane.
Azure SignalR Service
# Create Azure SignalR Service
az signalr create \
--name bookstore-signalr \
--resource-group bookstore-rg \
--sku Standard_S1 \
--unit-count 2 \
--service-mode Default
Update your application:
// In Program.cs
services.AddSignalR()
.AddAzureSignalR(builder.Configuration.GetConnectionString("AzureSignalR"));
Redis Backplane (Kubernetes)
For Kubernetes deployments:
services.AddSignalR()
.AddStackExchangeRedis(builder.Configuration.GetConnectionString("redis"), options =>
{
options.Configuration.ChannelPrefix = "BookStore.SignalR";
});
Load Testing
Perform load testing to validate scaling configuration and identify bottlenecks.
Using Azure Load Testing
# Create load testing resource
az load create \
--name bookstore-loadtest \
--resource-group bookstore-rg \
--location eastus2
Using k6 for Load Testing
// load-test.js
import http from 'k6/http';
import { check, sleep } from 'k6';
export const options = {
stages: [
{ duration: '2m', target: 100 }, // Ramp up to 100 users
{ duration: '5m', target: 100 }, // Stay at 100 users
{ duration: '2m', target: 200 }, // Ramp up to 200 users
{ duration: '5m', target: 200 }, // Stay at 200 users
{ duration: '2m', target: 0 }, // Ramp down
],
thresholds: {
http_req_duration: ['p(95)<500'], // 95% of requests under 500ms
http_req_failed: ['rate<0.01'], // Error rate under 1%
},
};
export default function () {
// Test book search
const searchRes = http.get('https://bookstore-api.azurecontainerapps.io/api/books/search?query=architecture');
check(searchRes, {
'search status is 200': (r) => r.status === 200,
'search response time < 500ms': (r) => r.timings.duration < 500,
});
sleep(1);
// Test book details
const detailRes = http.get('https://bookstore-api.azurecontainerapps.io/api/books/123e4567-e89b-12d3-a456-426614174000');
check(detailRes, {
'detail status is 200': (r) => r.status === 200,
});
sleep(1);
}
Run the test:
k6 run load-test.js
Monitoring and Observability
Key Metrics to Monitor
Application Metrics
- Request Rate - Requests per second
- Response Time - P50, P95, P99 latencies
- Error Rate - 4xx and 5xx responses
- Active Connections - Current connections to services
- Pod/Instance Count - Current replica count
Database Metrics
- Connection Count - Active database connections
- Query Performance - Slow query log
- Replication Lag - Delay between primary and replicas
- Cache Hit Ratio - PostgreSQL buffer cache efficiency
- Transaction Rate - Transactions per second
Resource Metrics
- CPU Utilization - Per instance/pod
- Memory Usage - Working set and GC pressure
- Network I/O - Throughput and latency
- Disk I/O - IOPS and throughput
Azure Monitor Queries
// Average response time by endpoint
requests
| where timestamp > ago(1h)
| summarize avg(duration), percentile(duration, 95) by name
| order by avg_duration desc
// Error rate over time
requests
| where timestamp > ago(24h)
| summarize
total = count(),
errors = countif(success == false)
by bin(timestamp, 5m)
| extend error_rate = errors * 100.0 / total
| render timechart
// Scaling events
AzureActivity
| where OperationNameValue contains "scale"
| where ResourceGroup == "bookstore-rg"
| project TimeGenerated, OperationNameValue, ActivityStatusValue, Properties
Prometheus Metrics (Kubernetes)
# servicemonitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: bookstore-metrics
namespace: bookstore
spec:
selector:
matchLabels:
app: apiservice
endpoints:
- port: metrics
interval: 30s
Cost Optimization
Right-Sizing Instances
- Monitor actual resource usage over 2-4 weeks
- Adjust resource requests/limits based on P95 usage
- Use burstable instances for variable workloads (Azure B-series, AWS T-series)
- Scale to zero for non-production environments
Reserved Capacity
For predictable workloads, use reserved instances:
# Azure Reserved Instances (1-3 year commitment)
az reservations reservation-order purchase \
--reservation-order-id <order-id> \
--sku Standard_D2s_v3 \
--location eastus2 \
--quantity 2 \
--term P1Y
Spot Instances (Kubernetes)
Use spot instances for non-critical workloads:
# node-pool-spot.yaml (AKS)
apiVersion: v1
kind: NodePool
metadata:
name: spot-pool
spec:
scaleSetPriority: Spot
scaleSetEvictionPolicy: Delete
spotMaxPrice: -1 # Pay up to regular price
nodeLabels:
workload: batch
nodeTaints:
- key: kubernetes.azure.com/scalesetpriority
value: spot
effect: NoSchedule
Scaling Checklist
Pre-Production
- [ ] Define resource requests and limits for all services
- [ ] Configure HPA or Azure Container Apps scaling rules
- [ ] Set up database read replicas
- [ ] Implement connection pooling (PgBouncer)
- [ ] Configure distributed caching (Redis)
- [ ] Set up SignalR backplane for multi-instance deployments
- [ ] Create load testing scenarios
- [ ] Define monitoring dashboards and alerts
Production Launch
- [ ] Start with conservative scaling settings (min replicas ≥ 2)
- [ ] Monitor scaling behavior for first 48 hours
- [ ] Perform load testing during off-peak hours
- [ ] Validate database replication lag
- [ ] Test failover scenarios
- [ ] Review and optimize slow queries
Ongoing Optimization
- [ ] Review scaling metrics weekly
- [ ] Adjust scaling thresholds based on actual traffic patterns
- [ ] Optimize database indexes based on query patterns
- [ ] Monitor cost vs. performance trade-offs
- [ ] Conduct quarterly load testing
- [ ] Update resource allocations based on growth
Troubleshooting Scaling Issues
Pods Not Scaling
# Check HPA status
kubectl describe hpa apiservice-hpa -n bookstore
# Verify metrics server
kubectl top nodes
kubectl top pods -n bookstore
# Check HPA events
kubectl get events -n bookstore --field-selector involvedObject.name=apiservice-hpa
Database Connection Pool Exhaustion
-- Check current connections
SELECT count(*) FROM pg_stat_activity;
-- Identify long-running queries
SELECT pid, now() - query_start AS duration, query
FROM pg_stat_activity
WHERE state = 'active'
ORDER BY duration DESC
LIMIT 10;
-- Terminate idle connections
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE state = 'idle'
AND now() - state_change > interval '10 minutes';
High Replication Lag
-- Check replication lag
SELECT
client_addr,
state,
sent_lsn,
write_lsn,
flush_lsn,
replay_lsn,
sync_state,
pg_wal_lsn_diff(sent_lsn, replay_lsn) AS replication_lag_bytes
FROM pg_stat_replication;
If lag is high:
- Check network connectivity between primary and replica
- Verify replica has sufficient resources (CPU, memory, disk I/O)
- Consider increasing
max_wal_sendersandwal_keep_size - Review
hot_standby_feedbacksetting