Full-TCP SACK Bug Fixes

June 3, 2005
Symptom: Large bursts of old packets sent
Problem: In SACK, h_seqno_ holds the value of the next sequence number send when filling a hole. After a SACK recovery and then a timeout, the sender starts sending packets from the previous recovery instead of the packets that were lost due to timeout. Note, this is not just a problem with timeouts. h_seqno_ does not get updated on new ACKs.
Example: ns tcp-full-bugs.tcl sack-burst-2 (Results)
Fix: Add a check to ack_action() to ensure that h_seqno is at least as large as highest_ack_. This check is already present in pack_action().

SackFullTcpAgent::ack_action()

*** 2781,2786 ****
--- 2849,2861 ----
                  sq_.cleartonxt();
          }
  	dupacks_ = 0;
+ 
+ 	/* 
+ 	 * Update h_seqno_ on new ACK (same as for partial ACKS)
+ 	 * -M. Weigle 6/3/05
+ 	 */
+ 	if (h_seqno_ < highest_ack_)
+ 		h_seqno_ = highest_ack_;
  }
  
  //

September 13, 2004
Symptom: SACK flows experience a timeout instead of a recovery.
Problem: curseq_ is not set until the application performs a close. Its default value is a negative number, so in the fix given below, the send_allowed method will always return false, since seq will always be greater than curseq_.
Dependency: This problem will not occur if the extra FINS are sent fix has not been applied.
Fix: Don't check curseq_ unless not in established state. This fix replaces the "Extra FINs are sent" fix in the patch dated Sep 2004.

SackFullTcpAgent::send_allowed()

*** 2869,2874 ****
--- 2946,2958 ----
  		return FALSE;
  	}
  
+ 	/*
+ 	 * If not in ESTABLISHED, don't send anything we don't have
+ 	 *   -M. Weigle 7/18/02
+ 	 */
+ 	if (state_ != TCPS_ESTABLISHED && seq > curseq_)
+ 		return FALSE;
+ 
  	// don't overshoot cwnd_
  	int cwin = int(cwnd_) * maxseg_;
  	return (pipe_ < cwin);

July 18, 2002
Symptom: Extra FINs are sent.
Problem: Whenever a duplicate ACK past the retransmission threshold (usually 3) is received, the sender will send more data. If a FIN has already been sent, the sender will retransmit the FIN for each duplicate ACK received.
Dependency: If the Reno fix for when the passive FIN is dropped has not been applied, there will be "recv'd pkt in CLOSED state" error messages since the extra FINs will be ACKed.
Example: ns tcp-full-bugs.tcl sack-send-past-fin (Results)
Fix: Don't allow the sender to send packets that have sequence numbers past the max produced by the application (unless it's a FIN).

Important: This fix has a problem. See the fix above (dated Sept 13, 2004) for the "real" fix.

SackFullTcpAgent::send_allowed()

  if (seq >= topawin) {
      return FALSE;
  }

+ // don't send anything we don't have  7/18/02 -MCW 
+ if (seq > curseq_)
+     return FALSE;
+
  // don't overshoot cwnd_
  int cwin = int(cwnd_) * maxseg_;

June 19, 2002
Symptom: Using SACK and ECN, the sender never retransmits a dropped packet.
Problem: The dropped packet is discovered via 3 duplicate ACKs. These ACKs are received in the same window of ACKs that report a congestion occurance (ECN Echo bit set).
Dependency: This fix depends on first fixing the SACK partial ACK bug.
Example: ns tcp-full-bugs.tcl sack-ecn-drop-mark (Results)
Fix: For a lost packet indicated by 3 duplicate ACKs in the same window as an ECN notification, do the same thing as if there was no ECN notification, except for decreasing the congestion window and ssthresh.

SackFullTcpAgent::dupack_action()

*** 2718,2729 ****
          }           
  
          if (ecn_ && last_cwnd_action_ == CWND_ACTION_ECN) {
  		last_cwnd_action_ = CWND_ACTION_DUPACK;
!                 cancel_rtx_timer();
!                 rtt_active_ = FALSE;
! 		send_much(1, REASON_DUPACK, maxburst_);
!                 return; 
!         }               
     
          if (bug_fix_) {
                  /*                              
--- 2786,2805 ----
          }           
  
          if (ecn_ && last_cwnd_action_ == CWND_ACTION_ECN) {
+ 		/* 
+ 		 * Received ECN notification and 3 DUPACKs in same 
+ 		 * window. Don't cut cwnd again, but retransmit lost
+ 		 * packet.   -M. Weigle  6/19/02
+ 		 */
  		last_cwnd_action_ = CWND_ACTION_DUPACK;
! 		cancel_rtx_timer();
! 		rtt_active_ = FALSE;
! 		int amt = fast_retransmit(highest_ack_);
! 		pipectrl_ = TRUE;
! 		h_seqno_ = highest_ack_ + amt;
! 		send_much(0, REASON_DUPACK, maxburst_);
! 		return; 
! 	}
     
          if (bug_fix_) {
                  /*

June 17, 2002
Symptom: Large bursts of old packets sent
Problem: In SACK, h_seqno_ holds the value of the next sequence number send when filling a hole. When reusing a TCP agent, h_seqno_ doesn't get reset. The sender thinks that there are many packets in the hole and sends them all at the same time.
Dependency: This problem was exposed after fixing the Reno FIN bug. Normally, h_seqno_ is updated as each new packet is sent. If the previous value of h_seqno_ is not (max_packet_size * x + 1), then this update won't occur. This condition exists when the last data packet sent is not full-sized. Then, the seqno of the FIN will not be (max_packet_size * x + 1).
This also depends on first fixing the SACK partial ACK bug.
Example: ns tcp-full-bugs.tcl sack-burst (Results)
Fix: Fix the typo in SackTcpAgent::reset()
(See email sent to ns-users list)

SackFullTcpAgent::reset()

*** 2683,2689 ****
  SackFullTcpAgent::reset()
  {
  	sq_.clear();			// no SACK blocks
! 	sack_min_ = h_seqno_ -1;	// no left edge of SACK blocks
  	FullTcpAgent::reset();
  }
  
--- 2750,2757 ----
  SackFullTcpAgent::reset()
  {
  	sq_.clear();			// no SACK blocks
! 	/* Fixed typo.  -M. Weigle 6/17/02 */
! 	sack_min_ = h_seqno_ = -1;	// no left edge of SACK blocks
  	FullTcpAgent::reset();
  }

May 21, 2002
Symptom: "warning: received illegal SACK block" error message
Problem: A packet is dropped and the FIN is received. The FIN appears in a SACK block. FINs have 0 data, so it appears that the SACK block contains 0 bytes.
Example: ns tcp-full-bugs.tcl sack-illegal-sack-block (Results)
Fix: Exempt FINs from the check.

SackFullTcpAgent::process_sack()

*** 2842,2848 ****
  
  	int slen = tcph->sa_length(), i;
  	for (i = 0; i < slen; ++i) {
! 		if (tcph->sa_left(i) >= tcph->sa_right(i)) {
  			fprintf(stderr,
  			    "%f: FullTcpAgent(%s) warning: received illegal SACK block [%d,%d]\n",
  				now(), name(), tcph->sa_left(i), tcph->sa_right(i));
--- 2917,2925 ----
  
  	int slen = tcph->sa_length(), i;
  	for (i = 0; i < slen; ++i) {
! 		/* Added check for FIN   -M. Weigle 5/21/02 */
! 		if ((tcph->flags() & TH_FIN == 0) && 
! 		    tcph->sa_left(i) >= tcph->sa_right(i)) {
  			fprintf(stderr,
  			    "%f: FullTcpAgent(%s) warning: received illegal SACK block [%d,%d]\n",
  				now(), name(), tcph->sa_left(i), tcph->sa_right(i));

May 2, 2002
Problem: After a retransmission timeout, SACK will only retransmit one packet per RTT. The correct behavior is to enter slow-start and recover the lost packets according to the congestion window in slow-start. The first ACK received after a timeout causes NewReno and SACK to think they are in fast recovery when they should be in slow-start.
Example: ns tcp-full-bugs.tcl sack-dropwin (Results)
Fix: Don't indicate that the ACK is a partial ACK if not in Fast Recovery. This fix also fixes the NewReno behavior.
(See email sent to ns-users mailing list)

FullTcpAgent::pack()

*** 764,770 ****
  FullTcpAgent::pack(Packet *pkt)
  {
  	hdr_tcp *tcph = hdr_tcp::access(pkt);
! 	return (tcph->ackno() >= highest_ack_ &&
  		tcph->ackno() < recover_);
  }
  
--- 777,784 ----
  FullTcpAgent::pack(Packet *pkt)
  {
  	hdr_tcp *tcph = hdr_tcp::access(pkt);
! 	/* Added check for fast recovery.  -M. Weigle 5/2/02 */
! 	return (fastrecov_ && tcph->ackno() >= highest_ack_ &&
  		tcph->ackno() < recover_);
  }

Michele C. Weigle