Keeping calls up when cluster switches to backup node

Features, capabilities, and information about HAAst
Post Reply
User avatar
crmuser
Posts: 175
Joined: Sun Nov 27, 2016 3:41 pm

Keeping calls up when cluster switches to backup node

Post by crmuser » Mon Mar 27, 2017 3:25 pm

We are designing a high availability solution for a customer that needs to keep calls up in the event of a cluster failover situation.

We have considered another product (which I believe you reference here http://telium.io/pages/forums/viewtopic.php?f=6&t=142 ) and we have concluded that it is not suitable. We need proper detection of node degradation, route degredation, call bridge failure, etc. and their solution doesn't do any of that. (Health monitoring is trivial). As well we need synchronization of databases, files, etc. between nodes and again their solution is very limited (synchronization is trivial - if you can even call it synchronization). So we want to work with HAAst and are trying to come up with a solution.

Our environment is straight forward:
  • All calls originate (or terminate) at the trunk (SIP endpoint on the left)
  • All phones terminate (or originate) calls (SIP endpoint on the right)
  • The HAAst cluster sits in the middle. (both cluster nodes are located in the same data center)
SIP Endpoint <---------> Cluster <----------> Sip Endpoint


We read your post (http://www.telium.io/pages/forums/viewt ... p?f=6&t=83) about involving the ITSP and we agree with the benefits of doing so. We're just wondering if there is another way to keep calls up? We also understand and agree with your recommendation not to introduce a single point of failure in front of the cluster, but let suppose we accept those risks. Is there a solution?
Account for questions transferred from CRM system
User avatar
Telium Support
Posts: 235
Joined: Sun Nov 27, 2016 3:27 pm

Re: Keeping calls up when cluster switches to backup node

Post by Telium Support » Mon Mar 27, 2017 3:27 pm

If you are willing to accept the risks of placing new single points of failure in your call path, then yes you have options. The key to this solution is to ensure directmedia (RTP flowing directly between endpoints). It's also quite likely that your endpoints will expect to see the SIP channel responsive as well (or they may drop the call).

Establishing directmedia involves:
  • Ensuring the media anchor points are accessible to one another without NAT.
  • Ensuring Asterisk is configured to use re-invites/directmedia
  • Ensuring your Asterisk dialplan does not force Asterisk to remain in the RTP stream
  • Ensuring your endpoints do not require transcoding (performed by Asterisk)
Optional: ensuring the SIP endpoints continue to see active SIP connections involves:
  • Placing a B2BUA (or gateway/proxy) between endpoints and the cluster - this device must place itself into the SIP stream
  • Configuring the B2BUA to allow the interior leg of the SIP call to drop, but keep the outter leg of the SIP call to remain active
  • Configuring the B2BUA to use UDP for SIP (at least for cluster facing leg). This is not always required
For example (this shows two B2BUA's for clarity, but you can adjust to fit your need):
Overview-640px.jpg
Overview-640px.jpg (22.71 KiB) Viewed 2874 times

There are open source B2BUA products which might be modifiable to do what you want (eg: the SIPpy project available at: https://github.com/sippy/b2bua). Keep in mind that you are creating a free version of the commercial solution we do not recommend. If this is a critical call center you may be better off developing a proper B2BUA from scratch to do what you want, including moving calls through the new active HAAst node, etc but that is a large undertaking.

Until the IETF establishes a standard for seizing midpoints of a SIP/RTP call there is no way to do so (call salvage / call survival) without compromising the resilience of your HA solution. In other words, it might feel good to say you have a call salvage mechanism in place, but putting a single point of failure in front of your cluster is of questionable value.
User avatar
crmuser
Posts: 175
Joined: Sun Nov 27, 2016 3:41 pm

Re: Keeping calls up when cluster switches to backup node

Post by crmuser » Thu Mar 21, 2019 3:49 pm

Is it possible to implement the above without the B2BUA on either side? If there's no SIP traffic won't the RTP channel stay up?
Account for questions transferred from CRM system
User avatar
Telium Support
Posts: 235
Joined: Sun Nov 27, 2016 3:27 pm

Re: Keeping calls up when cluster switches to backup node

Post by Telium Support » Thu Mar 21, 2019 3:53 pm

This may be possible but it really depends on:
  • If the endpoints use TCP for the SIP connection then they may detect the cluster failover (as the TCP connection closes, possibly with a FIN)
  • If the endpoints generate SIP traffic (eg: reregistering) then the lack of response or out of sequence response may cause the endpoint to terminate RTP connections
Telium does not provide assistance for creating this type of configuration - but we know clients have made this work.
Post Reply