libabigail
Classes | Namespaces | Functions
abg-diff-utils.h File Reference

This file declares types and operations implementing the "O(ND) Difference Algorithm" (aka diff2) from Eugene W. Myers, to compute the difference between two sequences. More...

#include <cassert>
#include <cstdlib>
#include <memory>
#include <ostream>
#include <sstream>
#include <stdexcept>
#include <string>
#include <vector>
#include "abg-fwd.h"
Include dependency graph for abg-diff-utils.h:
This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Classes

class  d_path_vec
 The array containing the furthest D-path end-points, for each value of K. MAX_D is the maximum value of the D-Path. That is, M+N if M is the size of the first input string, and N is the size of the second. More...
 
struct  deep_ptr_eq_functor
 An equality functor to deeply compare pointers. More...
 
struct  default_eq_functor
 The default equality functor used by the core diffing algorithms. More...
 
class  deletion
 The abstraction of the deletion of one element of a sequence A. More...
 
class  edit_script
 The abstraction of an edit script for transforming a sequence A into a sequence B. More...
 
class  insertion
 The abstration of an insertion of elements of a sequence B into a sequence A. This is used to represent the edit script for transforming a sequence A into a sequence B. More...
 
class  point
 A class representing a vertex in an edit graph, as explained in the paper. A vertex is a basically a pair of coordinates (abscissa and ordinate). More...
 
class  snake
 The abstraction of the Snake concept, from the paper. More...
 

Namespaces

 abigail
 Toplevel namespace for libabigail.
 
 abigail::diff_utils
 Libabigail's core diffing algorithms.
 

Functions

template<typename RandomAccessOutputIterator , typename EqualityFunctor >
void compute_diff (RandomAccessOutputIterator a_base, RandomAccessOutputIterator a_begin, RandomAccessOutputIterator a_end, RandomAccessOutputIterator b_base, RandomAccessOutputIterator b_begin, RandomAccessOutputIterator b_end, edit_script &ses)
 Compute the longest common subsequence of two (sub-regions of) sequences as well as the shortest edit script from transforming the first (sub-region of) sequence into the second (sub-region of) sequence. More...
 
template<typename RandomAccessOutputIterator , typename EqualityFunctor >
void compute_diff (RandomAccessOutputIterator a_base, RandomAccessOutputIterator a_begin, RandomAccessOutputIterator a_end, RandomAccessOutputIterator b_base, RandomAccessOutputIterator b_begin, RandomAccessOutputIterator b_end, vector< point > &lcs, edit_script &ses)
 Compute the longest common subsequence of two (sub-regions of) sequences as well as the shortest edit script from transforming the first (sub-region of) sequence into the second (sub-region of) sequence. More...
 
template<typename RandomAccessOutputIterator , typename EqualityFunctor >
void compute_diff (RandomAccessOutputIterator a_base, RandomAccessOutputIterator a_begin, RandomAccessOutputIterator a_end, RandomAccessOutputIterator b_base, RandomAccessOutputIterator b_begin, RandomAccessOutputIterator b_end, vector< point > &lcs, edit_script &ses, int &ses_len)
 Compute the longest common subsequence of two (sub-regions of) sequences as well as the shortest edit script from transforming the first (sub-region of) sequence into the second (sub-region of) sequence. More...
 
template<typename RandomAccessOutputIterator , typename EqualityFunctor >
void compute_diff (RandomAccessOutputIterator a_begin, RandomAccessOutputIterator a_end, RandomAccessOutputIterator b_begin, RandomAccessOutputIterator b_end, edit_script &ses)
 Compute the longest common subsequence of two (sub-regions of) sequences as well as the shortest edit script from transforming the first (sub-region of) sequence into the second (sub-region of) sequence. More...
 
template<typename RandomAccessOutputIterator >
void compute_diff (RandomAccessOutputIterator a_begin, RandomAccessOutputIterator a_end, RandomAccessOutputIterator b_begin, RandomAccessOutputIterator b_end, edit_script &ses)
 Compute the longest common subsequence of two (sub-regions of) sequences as well as the shortest edit script from transforming the first (sub-region of) sequence into the second (sub-region of) sequence. More...
 
template<typename RandomAccessOutputIterator , typename EqualityFunctor >
void compute_diff (RandomAccessOutputIterator a_begin, RandomAccessOutputIterator a_end, RandomAccessOutputIterator b_begin, RandomAccessOutputIterator b_end, vector< point > &lcs, edit_script &ses)
 Compute the longest common subsequence of two (sub-regions of) sequences as well as the shortest edit script from transforming the first (sub-region of) sequence into the second (sub-region of) sequence. More...
 
template<typename RandomAccessOutputIterator >
void compute_diff (RandomAccessOutputIterator a_begin, RandomAccessOutputIterator a_end, RandomAccessOutputIterator b_begin, RandomAccessOutputIterator b_end, vector< point > &lcs, edit_script &ses)
 Compute the longest common subsequence of two (sub-regions of) sequences as well as the shortest edit script from transforming the first (sub-region of) sequence into the second (sub-region of) sequence. More...
 
template<typename RandomAccessOutputIterator , typename EqualityFunctor >
void compute_diff (RandomAccessOutputIterator a_begin, RandomAccessOutputIterator a_end, RandomAccessOutputIterator b_begin, RandomAccessOutputIterator b_end, vector< point > &lcs, edit_script &ses, int &ses_len)
 Compute the longest common subsequence of two (sub-regions of) sequences as well as the shortest edit script from transforming the first (sub-region of) sequence into the second (sub-region of) sequence. More...
 
void compute_lcs (const char *str1, const char *str2, int &ses_len, string &lcs)
 Compute the longest common subsequence of two strings and return the length of the shortest edit script for transforming the first string into the second. More...
 
bool compute_middle_snake (const char *str1, const char *str2, snake &s, int &ses_len)
 Returns the middle snake of two strings, as well as the length of their shortest editing script. More...
 
template<typename RandomAccessOutputIterator , typename EqualityFunctor >
bool compute_middle_snake (RandomAccessOutputIterator a_begin, RandomAccessOutputIterator a_end, RandomAccessOutputIterator b_begin, RandomAccessOutputIterator b_end, snake &snak, int &ses_len)
 Returns the middle snake of two sequences A and B, as well as the length of their shortest editing script. More...
 
void compute_ses (const char *str1, const char *str2, edit_script &ses)
 Compute the shortest edit script for transforming a string into another. More...
 
template<typename RandomAccessOutputIterator >
void display_edit_script (const edit_script &es, const RandomAccessOutputIterator str1_base, const RandomAccessOutputIterator str2_base, ostream &out)
 Display an edit script on standard output. More...
 
template<typename RandomAccessOutputIterator , typename EqualityFunctor >
bool end_of_fr_d_path_in_k (int k, int d, RandomAccessOutputIterator a_begin, RandomAccessOutputIterator a_end, RandomAccessOutputIterator b_start, RandomAccessOutputIterator b_end, d_path_vec &v, snake &snak)
 Find the end of the furthest reaching d-path on diagonal k, for two sequences. In the paper This is referred to as "the basic algorithm". More...
 
template<typename RandomAccessOutputIterator , typename EqualityFunctor >
bool end_of_frr_d_path_in_k_plus_delta (int k, int d, RandomAccessOutputIterator a_begin, RandomAccessOutputIterator a_end, RandomAccessOutputIterator b_begin, RandomAccessOutputIterator b_end, d_path_vec &v, snake &snak)
 Find the end of the furthest reaching reverse d-path on diagonal k + delta. Delta is abs(M - N), with M being the size of a and N being the size of b. This is the "basic algorithm", run backward. That is, starting from the point (M,N) of the edit graph. More...
 
bool ends_of_furthest_d_paths_overlap (const point &forward_d_path_end, const point &reverse_d_path_end)
 
template<typename RandomAccessOutputIterator >
bool is_match_point (RandomAccessOutputIterator a_begin, RandomAccessOutputIterator a_end, RandomAccessOutputIterator b_begin, RandomAccessOutputIterator b_end, const point &point)
 Tests if a given point is a match point in an edit graph. More...
 
bool point_is_valid_in_graph (point &p, unsigned a_size, unsigned b_size)
 
template<typename RandomAccessOutputIterator >
void print_snake (RandomAccessOutputIterator a_begin, RandomAccessOutputIterator b_begin, const snake &s, ostream &out)
 This prints the middle snake of two strings. More...
 
int ses_len (const char *str1, const char *str2, bool reverse)
 Compute the length of the shortest edit script for two strings. This is done using the "Greedy LCS/SES" of figure 2 in the paper. It can walk the edit graph either foward (when reverse is false) or backward starting from the end (when reverse is true). More...
 
template<typename RandomAccessOutputIterator , typename EqualityFunctor >
int ses_len (RandomAccessOutputIterator a_begin, RandomAccessOutputIterator a_end, RandomAccessOutputIterator b_begin, RandomAccessOutputIterator b_end, d_path_vec &v, bool reverse)
 Compute the length of the shortest edit script for two sequences a and b. This is done using the "Greedy LCS/SES" of figure 2 in the paper. It can walk the edit graph either foward (when reverse is false) or backward starting from the end (when reverse is true). More...
 
template<typename RandomAccessOutputIterator >
int ses_len (RandomAccessOutputIterator a_begin, RandomAccessOutputIterator a_end, RandomAccessOutputIterator b_begin, RandomAccessOutputIterator b_end, d_path_vec &v, bool reverse)
 Compute the length of the shortest edit script for two sequences a and b. This is done using the "Greedy LCS/SES" of figure 2 in the paper. It can walk the edit graph either foward (when reverse is false) or backward starting from the end (when reverse is true). More...
 
bool snake_end_points (const snake &s, point &x, point &u)
 Get the end points of the snake, as intended by the paper. More...
 

Detailed Description

This file declares types and operations implementing the "O(ND) Difference Algorithm" (aka diff2) from Eugene W. Myers, to compute the difference between two sequences.

To understand what is going on here, one must read the paper at http://www.xmailserver.org/diff2.pdf. Throughout this file, that paper is referred to as "the paper".

The implementations goes as far as calculating the shortest edit script (the set of insertions and deletions) for transforming a sequence into another. The main entry point for that is the compute_diff() function.

Definition in file abg-diff-utils.h.