SIP Protocol Overview

SIP as an Internet Application Protocol:
In UC world, SIP is very often referred as an equivalent of H323 protocol, thus the majority UC guys believe that SIP is a multi-media communication protocol used in Telecom world, which is actually a misconception. As a matter of fact, SIP is natively an Internet Application Protocol which can be reflected in below 3 perspectives.

  1. SIP relies on other Internet Services such as DNS/TCP/IP/UDP etc.
  2. SIP message structure is heavily inherited from HTTP and SMTP. For instance, SIP message can be break down into 3 parts: Start-line (request or response), Header Field and Payload or Body Field; which is the same as HTTP. Besides, format of SIP Request/Response line also is a duplicate of HTTP’s. Even the Status Codes of SIP response line are inherited from HTTP protocol. SIP also borrows From/To/Subject headers from SMTP protocol and Content-Type/Content-Length headers from HTTP protocol.
  3. Apart from multi-media communications which is prevalent in UC, SIP also support a wide variety of internet applications making use of Content-Type Header.

SIP Terminology:
A detailed Review of SIP message structure is as below,
SIP Message Structure

SIP Dialog/Session: it is concept in Signaling part and set up on a endpoint-to-endpoint basis. A sip dialog/session is identified by combination of To tag, From tag, and Call-ID; these SIP dialog IDs are embedded in subsequent SIP method to identify established SIP session or dialog.

  • Dialog Stateful:
    When Record Route Header is utilized by a Proxy during the first SIP Request to ensure all remaining messages traverse that Proxy; this applies to each proxy that is in the signaling path between UAs

SIP Transaction: it is a concept in Signaling part and set up on a hop-by-hop basis. A transaction is identified by Branch value in Via Header.(branch value will change for every single Request-Response round trip). For instance, Request/ACK/BYE are regarded as different transactions. Beside, one single transaction is not terminated until a final response to initial request is received (2xx, 3xx response). Within each single transaction, the direction of transaction indicated by From & To Header Field will never change; but transaction direction might be different among transactions.

  • Transaction Stateless:
    The proxy server forwards all messages and responses without maintaining any state.
  • Transaction Stateful:
    A Proxy Server that receives a SIP Request retains state of that transaction until that Server receives a Final Response (meaning a 2XX, 3XX, 4XX, 5XX or 6XX Response) for that transaction. Transaction Stateful has no knowledge of a session Update Request (UPDATE), a Transfer Request (REFER) or of a Termination Request (BYE).

Basic SIP Call Flow & Message Processing:

SIP Basic Call Flow

Creating SIP 180 Ringing Message:

  • Copy most of the header fields (Via, To, From, Call-ID, Cseq, Max-Forward etc) from original INVITE message.
  • Insert SIP 180 Response Line (content in Response Line can be customized).
  • Prepend Received=x.x.x.x (DNS resolution of the URI in the beginning for Via Header) to Via Header Field. (NOTE: Normally, only End User device (Not Proxy) specify the IP address of last hop).
  • Add To-Tag and Contact Header Field of Called-Party. (NOTE, even though Contact Info is added in 180 Ring, Endpoint-to-Endpoint communication will only begin in the next transaction. Signal Message Flow will never change within a single one transaction).

Creating SIP 200 OK Message:

  • Message content is practically the same as SIP 180 Ring message except below 2 points.
  • Insert SIP 200 Response-Line
  • Insert SDP package

Creating SIP ACK message:

  • Insert ACK request-line (ACK is a new transaction)
  • New branch=xxxx in Via Header (branch identify each unique transaction which make sense only in a single Request-Response round trip of message exchange).
  • Same command sequence CSeq as the initial INVITE which this ACK is acknowledging.

Creating SIP BYE message:

  • Swap the From and To Header Field comparing with the previous INVITE message as it is a new transaction and its direction is different from the previous transaction invoked by INVITE message, but with the same To-Tag, From-Tag, Call-ID as former INVITE.
  • A different Command Sequence CSeq.

SIP Proxy Server Processing:

  • When receiving SIP Request Message like new INVITEs, Proxy Server performs routing based on SIP URI in Request-Line, insert itself into the Via Header Field with transaction identifier (branch=) and sends it to the next hop.
  • When receiving the SIP Response Messages, Proxy server check if itself listed in Via and transaction identifier (branch=) matched, then remove that Via entry and forward remains to the IP address in next Via entry.
  • Proxy Server only manipulates Request-Line and Via Header Field.

Benefits of Contact Header Field:

  • With Caller and Callee know Contact info of each other, subsequent SIP transactions can be carried on Endpoint-to-Endpoint basis.
  • Subsequent SIP transactions can also be forced to travel through Proxy Server when Proxy Server inserts Record-Route Header Field (Stateful Proxy).

Commonly used SIP messages:

SIP 180 Ringing: provisional response to indicate SIP INVITE Request has been received and alerting is taking place. It can be received with or without SDP media information, and Cisco IOS reaction to it depends on your configuration.

  • SIP 180 Ringing with SDP, early-media cut through, enabled by default.
  • SIP 180 Ringing without SDP, Cisco IOS generates ringback tone locally and streams it to Calling Party.
  • SIP 180 Ringing with SDP, force IOS to generates ringback tone locally by issuing “disable-early-media 180”under “sip ua”sub-config mode.

SIP—Enhanced 180 Provisional Response Handling.

SIP 183 Session Progress:  indicate the progress of the session. Unlike 100 Trying, 180 Ringing, 181 and 182 response a 183 is an end-to-end response with SDP media info included and establishes a dialog or a early-media channel (mast contain a To tag and Contact). It is normally used by Called GW to play ringback tone, pre-recorded announcement, music etc to Calling Party prior to the call being answered. Whenever SIP 183 Session Progress received, Cisco IOS will always cut-through early-media channel to stream whatever Called GW desires to Calling Party.


  • In an established session, a re-INVITE is used to update session parameters. However, neither party in a pending session (INVITE sent but no final response received 2xx, 3xx) may re-INVITE —instead, the UPDATE method is used (UPDATE method is only used in the mid-transaction of INVITE).
  • Possible uses of UPDATE include muting or placing on hold pending media streams, performing QoS, or other end-to-end attribute negotiation prior to session establishment.


  • The CANCEL method is used to terminate pending INVITEs or call attempts. It can be generated by either user agents or proxy servers provided that a 1xx response containing a tag has been received, but no final response has been received. A UA uses the method to cancel a pending call attempt it had initiated earlier. A forking proxy can use the method to cancel pending parallel branches after a successful response has been proxied back to the forking proxy. CANCEL is a hop-by-hop request and receives a response generated by the next stateful element.
  • The branch ID for a CANCEL matches the INVITE that it is canceling. During Call Forking, Forking proxy sends INVITE messages with different “branch” values (Transaction ID) to different user devices and maintain that transaction state on his own. Thus A Forking Proxy must be “Transaction”. A CANCEL only has meaning for an INVITE since only an INVITE may take several seconds (or minutes) to complete. All other SIP requests complete immediately (that is, a UAS must immediately generate a final response).
  • A UA confirms the cancellation with a 200 OK response to the CANCEL and replies to the INVITE with a 487 Request Terminated response.

SIP ACK Method:

  • The ACK method is used to acknowledge final responses to INVITE requests. Final responses to all other requests are never acknowledged. Final responses are defined as 2xx , 3xx , 4xx , 5xx , or 6xx class responses. The CSeq number is never incremented for an ACK, but the CSeq method is changed to ACK.
  • An ACK may contain an application/sdp message body. This is permitted if the initial INVITE did not contain a SDP message body. If the INVITE contained an SDP offer message body, the ACK may not contain an SDP message body. The ACK may not be used to modify a media description that has already been sent in the initial INVITE; a re-INVITE or UPDATE must be used for this purpose. SDP in an ACK is used in some interworking scenarios with other protocols where the media characteristics may not be known when the initial INVITE is generated and sent.
  • Difference between end-to-end ACK and hop-by-hop ACK;
    1. For 2xx responses, the ACK is end-to-end, but for all other final responses it is done on a hop-by-hop basis when Stateful proxies are involved. As a result, a proxy will generate an ACK for a 3xx , 4xx , 5xx , or 6xx response to an INVITE, as well as forwarding the response.
    2. The end-to-end nature of ACKs to 2xx responses allows a message body to be transported. An ACK generated in a hop-by-hop acknowledgment will contain just a single Via header with the address of the proxy server generating the ACK .
    3. A hop-by-hop ACK reuses the same branch ID as the INVITE since it is considered part of the same transaction. An end-to-end ACK uses a different branch ID as it is considered a new transaction.
    4. NOTE that, end-to-end ACK does not mean it is transported directly between the User Agents without traverse SIP Proxies. Actually, end-to-end ACK might traverse SIP Proxies if they ware stateful ones. A stateful proxy receiving an ACK message must determine whether or not the ACK should be forwarded downstream to another proxy or user agent. That is, is the ACK a hop-by-hop ACK or an end-to-end ACK ? This is done by comparing the branch ID for a match pending transaction branch IDs. If there is not an exact match, the ACK is proxied toward the UAS. Otherwise, the ACK is for this hop and is not forwarded by the proxy.


image         image

  • The PRACK method is used to acknowledge receipt of reliably transported provisional responses (1xx ). The reliability of 2xx , 3xx , 4xx , 5xx , and 6xx responses to INVITE s is achieved using the ACK method. The PRACK method applies to all provisional responses except the 100 Trying response, which is never reliably transported.
  • A PRACK is generated by a UAC when a provisional response has been received containing an RSeq reliable sequence number and a supported: 100rel. The PRACK echoes the number in the RSeq and the CSeq of the response in a RACK header. When no PRACK is received from the UAC after the expiration of a timer (an “X” is used to represent a lost message), the response is retransmitted. The receipt of the PRACK confirms the delivery of the response and stops all further transmissions. The 200 OK response to the PRACK stops retransmissions of the PRACK request.

SIP BYE Request:

  • A session is considered established if an INVITE has received a success class response ( 2xx ) or an ACK has been sent. The BYE method is used to terminate an established media session, while the pending INVITEs or call attempts are terminated using CANCEL method.
  • BYE is an end-to-end method, it is sent only by User Agents participating in the session, but never by proxies or other third parties.


  • The OPTIONS method is used to query a user agent or server about its capabilities and discover its current availability. The response to the request lists the capabilities of the user agent or server. A proxy never generates an OPTIONS request. A user agent or server responds to the request as it would to an INVITE. e). A success class ( 2xx ) response can contain Allow, Accept, Accept-Encoding, Accept-Language , and Supported headers indicating its capabilities. Feature tags (such as audio , video, and isfocus) should be included with the Contact header field.
  • An OPTIONS request may not contain a message body.

SIP REFER Request:


  • The REFER method is used by a user agent to request another user agent to access a URI or URL resource. The resource is identifi ed by a URI or URL in the required Refer-To header fi eld. Note that the URI or URL can be any type of URI: sip , sips , http , pres , and so forth. When the URI is a sip or sips URI, the REFER is probably being used to implement a call transfer service.
  • A REFER request can be sent either inside or outside an existing dialog.
  • A UAC sends a REFER to a UAS. The UAS, after performing an authentication and authorization decides to accept the REFER and responds with a 202 Accepted response. Note that this response is sent immediately without waiting for the triggered request to complete. This is important because REFER uses the non-INVITE method state machine which requires an immediate fi nal response, unlike an INVITE which may take several seconds (or even minutes) to complete.
  • This INVITE is successful since it receives a 200 OK response. This successful outcome is communicated back to the UAC using a NOTIFY method. The message body of the NOTIFY contains a partial copy of the final response to the triggered request. In this case, it contains the start-line SIP/2.0 200 OK . This part of a SIP message is described in the Content-Type header field as a message/sipfrag. Note that this implicit subscription can be cancelled by including the Refer-Sub: false header field in the REFER. If the 2xx response to the REFER also contains the Refer-Sub: false header field, no NOTIFYs will be sent.

To be continued.

SIP implementation of Cisco UC product portfolio:

Understanding SIP – CUCM 9.1.1 System Guide.

How Cisco handle Mid-SIP-Call media change;

  • INVITE or re-INVITE message with Inactive SDP offer (a=inactive attribute) to break media stream in an established SIP session before invoking any other supplementary service; and subsequently sends an INVITE without SDP offer to insert MOH or resume media stream, and expects a send-recv SDP offer in the 200 OK response. (Because third-party devices often provide an inactive offer SDP in the 200 OK instead of providing a send-recv offer SDP, the media path remains in an inactive state and causes calls to drop.
    CUCM allows you to configure a parameter for an early offer SIP trunk so that CUCM suppresses the sending of inactive or sendonly SDP in mid-call INVITEs. When this parameter gets enabled, CUCM connects the SIP trunk device directly to the MOH or annunciator device without breaking the existing media stream during call hold or during other feature invocation. Similarly, CUCM connects the SIP trunk device to a line-side device directly during call resume without breaking the MOH or annunciator stream. By preventing the far-end media stream from getting set to inactive, CUCM should always be able to resume the media path)
  • INVITE or re-INVITE message with delayed SDP offer to directly link existing media stream in an established SIP session to another destination.
  • UPDATE message for media change in a non-established SIP session (INVITE message sent without final response received).

How Cisco handle Call Hold / Resume;

CUCM supports call hold and retrieve that a SIP device initiates or that a CUCM devices initiates.

Call Hold

  • To put call on hold, CUCM utilizes mid-SIP-call media change process mentioned above.
  • To retrieve a Hold call, CUCM sends re-INVTIE message with updated Remote-Party-ID info that reflects the current connected party (but without SDP offer) to SIP proxy.

How Cisco handle SIP Call Transfer:

Call Transfer

Call Transfer 2

  • CUCM supports SIP-Initiated call transfer and it accepts REFER requests or INVITE message that includes a Replace Header.
  • Configure UPDATE and PRACK on the SIP trunk to provide ringback in blind transfer cases when the consult call leg on early offer SIP trunk provides in-band ringback or announcements. If the trunk is not enabled for PRACK or if the far-end device does not support UPDATE, the transferee does not receive a ringback tone.

How Cisco handle SIP Call Forward;

CUCM supports call forward that a SIP device initiates or that a CUCM device initiates. With call forwarding redirection requests from SIP devices, CUCM processes the requests. For call forwarding that is initiated by Cisco Unified Communications Manager, the system uses no SIP redirection messages. Cisco Unified Communications Manager handles redirection internally and then conveys the connected party information to the originating SIP endpoint through the Remote-Party-ID header.

Concerns about Media Termination Point (MTP) devices;

  • You can configure Cisco Unified Communications Manager SIP devices (lines and trunks) to always use an MTP. If the configuration parameters are set to not use an MTP (default case), Cisco Unified Communications Manager will attempt to dynamically allocate an MTP if the DTMF methods for the call are not compatible. For Cisco IP phones running SIP, When MTP Required check box is not checked, CUCM will still insert MTPs dynamically as needed. So, Only generic, third-party SIP IP phones use this MTP Required check box.
  • For the SIP devices (phones or trunks) with MTP Required checkbox enabled, only G.711 codec will be allowed during the media negotiation. So, regarding the region that hosts those kinds of SIP devices, thus make sure the max bandwidth allowed for each call should be greater than G.711 in the Region Relationship configuration.

What is send-only, recv-only, send-recv, inactive SDP offer.

Disabling sending INVITE with inactive SDP for Early-Offer SIP Trunk in SIP Profile config to fix MOH issue.

CCM Advanced Parameters-> set “FULL Duplex Stream” to True, fix MOH issue on SIP Trunk. (This Duplex Stream setting is intended for firewalls and NAT).

SDP Information Elements:


  • SDP Protocol Version Number
    v=0; as the current SDP version is 0.
  • Origin (owner/creator) and session identifier
    o=username session-id version network-type address-type address
    E.g; o=CiscoSystemsCCM-SIP 2000 1 IN IP4
    username: host-id or login-id of originator.
    session-id: a NTP time stamp or a random number used to ensure uniqueness.
    version: a numeric field that is increased for each change to the session.
    network-type=IN: IN for internet;
    address-type=IP4: or IP6.
    address: IPv4 or IPv6 address in dotted decimal form or a Fully Qualified Host Name. This address if the SIP Proxy Address (like CUCM) for mid-call signaling purpose.
  • Session Name
    s=xxx; a name for the session, can contain any non-zero number of characters.
    E.g; s=SIP Call
  • Timer session starts and stops
    t=start-time stop-time; The times are specified using NTP timestamps.
    stop-time=0; indicates that the session goes on indefinitely.
    start-time=0 & stop-time=0; means permanent session.
  • Media Announcements
    m=media port transport format-list; it contains a list of medial session types which are supported.
    E.g; m=audio 24580 RTP/AVP 8 101
    media: could be audio | video | text | application | message | image | control.
    port: receiving port number to be used for this media session.
    transport: contains the transport protocol or the RTP profile used, normally RTP/AVP (RTP Audio Video Profile).
    format-list: more information about the media, usually contains the media payload types defined in RTP audio video profiles. More than one media payload type can be listed here, allowing multiple alternative codecs for the media session.
  • Attributes
    a=xxx; provides more detailed attributes about preceding media session.
    E.g; a=rtpmap:8 PCMA/8000 | a=rtpmap:101 telephone-event/8000
  • Connection Information for media Connection
    c=network-type address-type connection-address
    E.g; c=IN IP4
    network-type: IN for internet.
    address-type: IP4 or IP6.
    connection-address: IP address or FQDN that will be sending the media packets, which could be unicast or multicast.
    If multicast, then connection-address=base-multicast-address/ttl/number-of-addresses.
    ttl: time-to-live value.
    number-of-addresses: indicates how many contiguous multicast addresses are included starting with the base-multicast-address.

Wikipedia-Session Description Protocol
Wikipedia-RTP Audio Video Profiles
RFC3551-Payload Type Definitions