A geometric method for contour extraction of Drosophila embryos

Background High resolution images of Drosophila embryos in their developmental stages contain rich spatial and temporal information of gene expression. Automatic extraction of the contour of an embryo of interest in an embryonic image is a critical step of a computational system used to discover gene-gene interaction on Drosophila. Results We propose a geometric method for contour extraction of Drosophila embryos. The key of the proposed geometric method is k-dominant point extraction that is a generalization of 3-dominant point extraction proposed in our previous work. Based on k-dominant point extraction, we can approximate a connected component of edge pixels by a polygon that can be either convex or concave. The test on BDGP data shows that the proposed method outputforms two existing methods designed for contour extraction of Drosophila embryos. Conclusions The main advantage of the proposed geometric method in the context of contour extraction of Drosophila embryos is its ability of segmenting embryos touching each other. The proposed geometric method can also be applied to applications relevant to contour extraction.


Background
High resolution images of Drosophila embryos in their developmental stages contain rich spatial and temporal information of gene expression. They have become a valuable instrument for micro-biologists to discover genegene interaction [1]. Automatic extraction of the contour of an interest embryo in an image is a critical step of a computational system for the discovery of gene-gene interaction on Drosophila [2].
In general, Drosophila embryonic images contain substantial amount of variations [3,4]: i) imaging conditions, such as contrasts, scale, orientation, and neighboring embryos, ii) gene expression patterns, and iii) developmental stages. Most existing methods were developed upon low-level image features, such as edge pixels or pixels with a high deviation of grayvalues in a local window *Correspondence: qi.li@wku.edu 1 Western Kentucky University, 1906 College Blvd., 42101 Bowling Green, KY, USA Full list of author information is available at the end of the article [3][4][5][6][7][8][9][10]. Peng and Myers [5] proposed a method that uses the standard deviation of the local windows of a pixel to classify the pixel as a foreground or background pixel. Their method applies a 8-neighbor-connectivity regiongrowing method to extract the contour of an embryo. Pan et al. [6] applied a variant of Marquardt-Levenberg algorithm to estimate an optimal affine transformation to register localized embryos into an ellipsoidal region. Puniyani et al. [7] proposed an edge detection based method that assumes a number of heuristic constraints, including object size, convexity, shape features (e.g., ratio of the major over minor axis of an object), and the percentage of overlapping regions. Frise et al. [8] developed the method of Peng and Myers [5] by adding three morphological operations on a binary image: i) removal of isolated pixels, ii) dilation, and iii) majority processing. Futhermore, Frise et al. [8] proposed a heuristic algorithm to separate the embryo of interest from multiple touching embryos, with the assumption that the center of the embryo of interest is the image center. Mace et al. [9] proposed an eigen-embryo method to extract the contour of embryos, where a particle swarm optimizer was used to reduce the computational cost of searching optimal eigen parameters. Li and Kambhamettu [3] proposed a quadratic curve model to initialize the contour of the embryo of interest based on edge pixels, and applied an active contour model to refine embryo contours. Bessinger et al. [10] proposed criteria to select the optimal connected component of edge pixels in the scale space of an input image. Li [4] proposed algorithms to detect and restore deficiencies and faults of primal sketch tokens that occur when a targeting object is surrounded by a complex background.
In this paper, we propose a geometric method for contour extraction of Drosophila embryos, and the key of the proposed method is k-dominant point extraction. In the context of rectangular shape detection [11], we have proposed 3-dominant point extraction that can be used to analyze the geometric structure of licence plates. Note that the contour of a license plate is relatively simplefirst of all, it is piecewise linear; second, it is convex. We propose to generalize 3-dominant point extraction to k-dominant point extraction in order to adapt to the complex geometric structure of the contour of a Drosophila embryo. The complexity of these geometric structures mainly lies in the following two aspects: i) the contour of a Drosophila embryo may be concave and ii) the contour of two Drosophila embryos that touch to each other may be concave (see Fig. 1). The proposed method is able to segment embryos touching to each other. Note that many methods on contour extraction, including an active contour model, can not segment two objects that touch to each other. The proposed geometric method can also be applied to other tasks relevant to contour extraction.

3-dominant point extraction
Given a set of points, such as a connected component of edge pixels, Li et al. [12] proposed a recursive method to extract three dominant points v 1 , v 2 and v 3 . The first two points (v 1 and v 2 ) maximize the Euclidean distance of an arbitrary pair of points in C, and the third point v 3 maximizes the sum of distances between a p ∈ C and v i , i = 1, 2, i.e., Given a point and a line segment v 1 v 2 , we call p − v 1 + p − v 2 a point-line-segment distance to distinguish the common point-line distance that is defined by the distance between p and its vertical intersection with a line passing v 1 and v 2 .
Based on v 1 , v 2 , and v 3 , the piecewise linearity of C is then verified, i.e., where each point p ∈ C falls on either the line segment v 1 v 3 or v 2 v 3 . If yes, the recursion is a b Fig. 1 Two circumstances for the introduction of k-dominant points. a the contour of a Drosophila embryo forms a concave shape, and b the contour of two Drosophila embryos that touch to each other forms a concave shape stop. Otherwise, C is partitioned into two subsets, and the above 3-dominant point extraction method is then applied to the two subsets recursively.

K-dominant point extraction: basic concepts
In this section, we first generalize the approach for locating the 3rd dominant point, given two dominant points, from a set of unordered points P to a formula for locating the i-th dominant point, given i number of dominant points, from P. Then, we propose a simple method to insert a new dominant point into a sequence of geometrically-ordered dominant points so that dominant points are ordered geometrically. Last, we proposed a solution to address the challenge of concave polygons.
The basic idea of k-dominant point extraction is to iteratively insert a new dominant point into a set of k − 1 dominant points that have been found, under certain geometric constraint. For the convenience of illustration, we now introduce several basic concepts. First of all, the k − 1 dominant points are expected to be a geometric sequence that is consistent with a given set of 2D points.
Denote r(1), . . . r(k) is a permutation of 1, . . . , k. A geometric sequence of k − 1 dominant points is denoted as S k−1 = v r (1) , v r (2) , . . . , v r(k−1) . A valid geometric sequence is expected to be a polyline, i.e., v r(1) v r (2) is the first line segment passing a subset of points, and v r(2) v r (3) is the second line segment passing a subset of points, etc. The initial geometric sequence contains two dominant points, ideally representing a line segment (also called 1-piece polyline).
A closed tag is introduced with respect to a consecutive pair of dominant points (v r(i) , v r(i+1) ) in a geometric sequence with the motivation of speeding up the insertion of a new dominant point. A consecutive pair with (v r(i) , v r(i+1) ) a closed tag indicates that a new dominant point is not allowed to be inserted between v r(i) and v r(i+1) in the associated geometric sequence.
Insertability is introduced with respect to a new dominant point in order to tell whether the new dominant point is allowed to insert to a geometric sequence of dominant points. Insertability of a point is essentially introduced as a condition to stop the "global" search of dominant points. Imagine that we have a set of points forming a rectangle. After we find out four dominant points associated with the four vertices of a rectangle, the insertability of the fifth dominant point is expected to be NO in order to avoid inserting a non-vertex point into the sequence. Given a geometric sequence where is a parameter to tolerate the distortion of a straight line. is not a sensitive parameter, and it can be set from 0.01 to 0.05. In this paper, we fix it to be 0.02. Thus, we sometime simply call a point v k S-insertable, or just insertable.

Initialization
Similar to 3-dominant point extraction, k-dominant point extraction starts from locating two points v 1 and v 2 , given a point set, such that their Euclidean distance is a maximal distance among distances of all pairs of points in P, i.e., The closed tags for v 1 and v 2 are both initialized as 0 (i.e., false).

Searching a new dominant point
Given a set of points P and k − 1 dominant points v 1 , . . . , v k−1 , k > 2, we propose the following formula to search the k-th dominant point: For k = 3, we have an initial geometry sequence S 2 = v 1 , v 2 . Based on Eq. 2, we can test the insertability of the new dominant point v 3 . For k > 3, we can assume that a geometry sequence S k−1 has been iteratively built, as described in the following section.
It is worth noting that the time complexity of searching a new dominant point depends on the point set P, i.e., (|P|). However, testing the insertability of the new dominant point depends on the geometry sequence S only, i.e., in the cost of (|S|). With closed tags, the computational cost can be further reduced.

Upon an insertable dominant point: growing S k−1
Given an S-insertable dominant point, we will insert the point into an open "slot" of the geometric sequence S. Given a sequence S k−1 of k − 1 dominant points with a geometric order, i.e., ) of S offers a space for an insertable v k to insert, as follows: where indicates a possible space where v k may be inserted. Here the notation S k−1,i represents an abstract geometric sequence that contains a placeholder between the pair For convenience, we introduce r(k) = r(1) and augment the sequence v r (1) , v r (2) , We propose two criteria to measure the confidence of a sequence of dominant points S k = v r (1) , v r (2) , . . . , v r(k) . The first one is: where · denotes the cardinality of a set. The second criterion is based on overall length of the polyline formed by the sequence S k , i.e., The second criterion can be applied if C forms a simple curve (no self intersection). It is also easy to see that the computational cost of the second type of confidence is much lower than the first type.
By maximizing the confidence of each derived sequence, we can decide the optimal insertion of a new dominant point in order to maintain the geometric order. Figure 2 illustrates an example of assigning a geometric order of k dominant points. Given the first dominant points v 1 and v 2 . Without lose of generality, we start from the geometric sequence v 1 , v 2 . After v 3 is computed by Eq. 4, there are two possible options to insert v 3 : i) v 1 , , v 2 and v 1 , v 2 , . The first option brings us the sequence v 1 , v 3 , v 2 , where closed tag assignment Based on the confidence measure, these two options are both optimal. Without lose of generality, we choose the first option for following illustration. Consecutively, the proposed method grows the geometric sequence as follows:

Upon a non-insertable dominant point: reduce P
If a new dominant point v i computed by Eq. 4 is noninsertable, we will stop growing the geometric sequence and start to reduce the input point set P. The basic idea of reducing P is to remove all points that lie in one of closed line segments such that we obtain a subset of points that have a simpler topology. A point remained in P must be associated with a certain open pair of dominant points. So the above method can be recursively applied to each subset of points, and in turn each output (a sequence of dominant points from a subset of points can be correctly inserted into a higher level output). Simply speaking, a non-insertable dominant point activates a divide-andconquer strategy that can handle the concavity of a data set shaped by a concave polygon. Figure 3 illustrates the basic idea on how to reduce P when a new dominant point is tested to be non-insertable. After reduction, there are two subsets of points since there are two open pairs of dominant points in the geometric sequence. K-dominant point extraction method is then recursively applied to these two point sets, respectively.
Algorithm 1 summarizes the procedure of testing the closedness of a pair of dominant points (v, w), given a point set P. Note that if closedness is true, P will be updated by removing all points near the line segment vw. The time complexity of the algorithm is dominated by Step 5 (sorting) that is O(|P| log(|P|)). Algorithm 2 Step 13 (closedness) that is O(|P| log(|P|), in addition to the recursive step, i.e., Step 16. In many cases, the concavity is not very complicated and the depth of recursion won't be over 2. Therefore, the overall time complexity of the Algorithm is O(|P| 2 (1 + log(|P|) 2 )) with an assumption that the number of vertices of a polygon is a constant and points in P can be uniformly projected to S. (v, w) a pair of dominant points P a point set output: closedTag true/false P updated P compute v i 8.

Limitation of max-sum-distance measure
if v i is insertable for S 9.
break; // stop growing sequence 13. for each consecutive pair start to reduce the given point set P. However, it turns out that P is not reducible. This scenario shows a limitation of using max-sum-distance in computing k-dominant points. Note that this scenario is a representative formuation of two objects who have smooth boundaries and are touching to each other. Figure 4b shows a scenario where the fourth dominant point is non-insertible and the point set is reducible with respect to v i , i = 1, . . . , 3. The fourth dominant point is expected to be a point on the line segment v 2 v 3 . Precisely speaking, the fourth dominant point is the point nearest  Figure 4c shows a scenario where the fourth dominant point is insertible. Therefore, the algorithm will continue to locate the fifth dominant point. Reducibility is thus not applicable in this scenario.
The rationale of introducing a min-sum-distance can also be found by a Fermat point and its generalization called a geometric median. Given three points in a plane, the Fermat point is the point in the plane that minimizes the sum of distances from itself to the three points. The Fermat point can be computed analytically, as illustrated in Fig. 5. Specifically, we first construct an equilateral triangle (in dash lines) for each edge of the given triangle (in solid lines), and then connect the outmost vertex of each equilateral triangle to a given vertex outside the equilateral triangle by a red line. The Fermat point must be the intersection of three red lines. Given m points in a plane, the geometric median is the point in the plane that minimizes the sum of distances from itself to the m points.
A subtle difference between computing a geometric median and computing a dominant point under the minsum-distance scheme is that the search space in the former problem is a continous and infinite plane, while the search space in the latter problem is a discrete and finite point set. For convenience, we call the problem of computing a dominant point under a min-sum-distance scheme discrete geometric median.

Max/min-sum-distance measure
The three scenarios illustrated in Fig. 4 show that a max-sum-distance scheme is not able to handle various structures of a point set to extract dominant points. A min-sum-distance scheme is expected to be integrated with a max-sum-distance scheme adaptively. In other words, we have to compare both schemes, a max-sumdistance and a min-sum-distance, and select a better scheme under some criterion to extract the next dominant To solve the max/min-sum-distance problem, we propose a balance-oriented criterion to select an optimal measure between max-sum-distance and min-sumdistance measure as follows: balance the distance from a new dominant point to its two neighboring dominant points in the new geometric sequence. Specifically, given a point set P and a geometric sequence of dominant points S k−1 = v 1 , . . . , v k−1 , we have two candidates of a new dominant point: i) v max k according to the max-sumdistance measure, and ii) v min k according to the min-sumdistance measure. For each candidate, v max k or v min k , we first compute the optimal location to insert by maximizing the confidence of a derived sequence, i.e., We then compute a distance ratio with respect to v max k , v min k and their neighboring dominant points as follows: Note that both r max and r min range from 0 to 1. A larger value indicates more balanced distance from a candidate to its optimal neighbors. So, if r max ≥ r min , the candidate v max k is selected. Otherwise, v min k is selected. It is intuitive that dominant points selected by a balanceoriented criterion have a better description of the global structure of a point set, and thus they can provide more reliable estimation of geometric properties of a point set, such as curvature. In contrast, if a dominant point has an imbalanced distance ratio to its two neighbors, the estimation of a geometric property at this dominant point will be very sensitive to the localiation error of itself and its neighbors.
It is easy to see that the balanced-oriented criterion on distance measure can address the above-mentioned three scenarios very well. It is also not difficult to show that the max/min-sum-distance measure can be consistently selected if a point set forms a convex polygonal shape. Figure 6 shows a scenario where a point set contains an inflection point v 4 . Since an inflection point is neither a local maximum or a local minimum, the max/minsum-distance measure fails to extract such a point as a dominant point at an "early" round. (It is possible that an inflection point can be eventually extracted after a few more rounds.) More specifically, three dominant points v 1 , v 2 and v 3 are first extracted without any problem. Based on max/min-sum-distance measure, there are two candidates for the 4th dominant point: i) v 4 according to max-sum-distance measure, and ii) v 4 according to min-sum-distance measure. However, the inflection point v 4 has a more balanced distance ratio to its neighboring dominant points v 1 and v 2 than v 4 and v 4 . Thus, a inflection point, intuitively, can have a better description of the global structure of a point set, say, a sequence of dominant points that can response to inflection points can cover a larger number of points in a given point set than an equallength sequence of dominant points that cann't response to inflection points.

Max/min/median-sum-distance measure
We now generalize the max/min-sum-distance measure to a max/min/median-sum-distance measure. We follow a balance-oriented criterion to select an optimal measure among three measures: i) max-sum-distance, ii) min-sumdistance measure, and iii) median-sum-distance as fol-  Fig. 8 The longest subsequence output by Algorithm 3. To enhance the intuition, the subsequence is drawn over the corresponding subset of input points that are, however, not involved in Algorithm 3

Fig. 9
A comparison between Li's method [4] (the first row) and the proposed method (the second row) We then compute a distance ratio with respect to v max k , v min k , v mdn k , and their neighboring dominant points as follows: Note that r max , r min and r mdn range from 0 to 1. The largest value indicates the most balanced distance from a candidate to its optimal neighbors. For example, if r max is the largest value, the candidate v max k will be selected.

Sequence subdivision
We propose a sequence subdivision method for the segmentation of touching embryos. The basic idea of our method is based on the detection of nonsmooth dominant points from a given geometric sequence. To build up an intuition, let us start from Fig. 7 that shows a comparison between a scenario of a concave-shape embryo and a scenario of touching embryos. Given a domi- − v i are two directional vectors centered at v i . The cross angle between these two vectors that can be computed by can be used to tell whether a dominant point is smooth or not. For simplicity, we call the cross angle of the two directional vectors centered at v i the angle of v i . Figure 7a shows a scenario where the two hollow dots represent the two intersections between the contours of two touching Drosophila embryos, and their angles are acute. Figure 7b shows a scenario where the contour of a Drosophila is a bean shape that contains two inflection points. The angles of two inflection points are obtuse, i.e., similar to the cross angle of other dominant points given a smooth contour. Figure 8 shows the longest subsequence output by the Algorithm 3 being applied to the geometric sequence presented in Fig. 1b.
We first give a visual evaluation on the proposed method. Figure 9 shows a comparison between Li's method [4] and the proposed method on BDGP images of touching Drosophila embryos. We can observe that Li's method fails in all three cases, while the proposed method works well. Figure 10 shows more positive results of the proposed method. closedTag(v i ) = 0; // tag is set to open 4. k = 1; S k = // init 1st subsequence 5. i = 1; 6. while i ≤ n 7. insert v i into S k until closedTag(v i )=0; 8. k++; S k = // init next subsequence 9. i++; Next, we present a quantitative evaluation of different combinations of the parts of the proposed method in terms of detection rates. Given an image, the result is defined as a successful detection if the output by Algorithm 2 has larger than 90% overlapping region with the ground truth. The dataset contains 2000 images of BDGP Drosophila embryos. Table 1 shows the quantitative results, including the comparison of two existing methods on contour extraction of Drosophila embryos: i) Li and Kambhamettu's method [3] that consists an initialization based on a quadratic curve model, and a refinement based on an active contour model; ii) Li's method [4] that can detect and restore deficiencies and faults of primal sketch tokens occurring when a targeting object is surrounded by a complex background.

Discussion
As we mentioned in the introduction, the proposed framework can be applied to other applications of contour extraction. The main contribution of the proposed framework is k-dominant point extraction based on a specific distance measure. There is a trade off among the three types of distance measures: max-sum, max/min-sum, and max/min/median-sum. The last one is the most sophisticate one, i.e., it can deal with concave contours, and touching scenarios of targeting objects, while the first one is the most efficient one in computation. Therefore, maxsum distance may be used in some circumstance, e.g., contours are convex. K-dominant point extraction may also be applied to more general applications beyond contour extraction, such as data clustering. In other words, the data that k-dominant point extraction is applied can have arbitrary dimensions rather than two.

Conclusions
We have proposed a geometric method for contour extraction of Drosophila embryos. Experiment results show the effectiveness of the proposed method, typically in segmenting two touching embryos in an image. The results also show the superiority of the proposed method over two previous methods. The proposed method advances the theory of control point detection by generalizing 3-dominant points to k-dominant points. The generalization includes strategies to deal with concave shapes, thus the proposed method can be applied to a wide range of applications relevant to contour extraction.