In SQL Server, if the primary is no longer available, it is very dangerous to data. Because primary usually is the only entry that user can write bytes into. Primary is not available, so that service is not available to write. To client end, the service still is readable but service is actually dead. In distributed system, high availablity feature try to achieve ZERO downtime. Therefore, effictive detect unavailable resource is required.
Lease pattern uses to controlled resources usage that we should released unused resource peroidly
Windows Server Failover Cluster need to know primary server is no responsed/unavailable, so WSFC performs failover, release dead primary and assign a secondary server as new primary. Lease pattern is introduced to do detection periodicly.
Lease pattern is a management tool to manage resource. Regarding to a resource, we have resource provider, and resource user. resource provider is called grantor, and resource user is holder. A lease is negotiated between grantor and holder in a time duration. When lease is not renew in the time interval, lease will timeout and release resource.
we need to determine what resources to assciated with. In MSSQL, each SQL Server is a resource. HadrRes is a resource type, but an availability group is a HadrRes resource. WSFC is grantor, and SQL Server is a holder.
Lease creation policy. a lease is created by the lease grantor
- one user per lease
- specify duration
- grantor maintain mapping lease and resource
Lease renew Policy
Lease expire Policy
- how to release resource from lease when lease timeout
- grantor should remove mapping
- prepare all cleanup method
This is my understanding about WSFC lease pattern usage. when we create an availability group in WSFC.
- WSFC register HadrRes type to understand AG interface.
- WSFC create availability group resource, called AG01
- during bring AG01 online, WSFC prepare and enroll the lease by doing following. HadrRes uses AG01’s property, the primary, to create an event handler. HadrRes binds this event handler with a watchdog callback. WSFC registers the event.
- WSFC keeps mapping between the event and primary(resource)
- in watchdog callback, WSFC keeps check lease heartbeat. If WSFC doesn’t receive response from heartbeat in a interval, lease timeout. WSFC terminate the lease and release the AG resource.
- Based on timeout policy, WSFC should release a primary and desinate a secondary as new Primary. At this point, AG01 resource should offline and bring online. A new lease should generate in this process.
- Resource management simplicity
- Efficient Resource Usage. It is time based usage, we can release dead resource in every time interval.
- Resource Update Simplicity. When you keeps binding between user and resource, user can get updated resource in transparent.
- Enhance System Reliability. Hey, we know resource is dead in single time interval.
- additional Overhead. Each user need a lease to use same resource, but we can use lease pool to limit too much lease being generate in the same time also don’t have to frequently create and release the lease.
- Additional application logic. well, in windows cluster it is another 10K lines code.
- Timer watchdog. Not all Operating System has watchdog, but we can use event-based callback to trigger lease check instead.