||A Study on Content-Based Image Retrieval
||Department of Computer Science and Information Engineering
content-based image retrieval (CBIR)
local edge map
考慮以上的要求，本論文提出三個以影像內容為基礎之查詢系統。首先是針對物體邊緣的研究，我們使用一快速的邊緣點偵測演算法來偵測出影像中所有可能的邊緣點，並提出一新的物件表示法—爬山式序列表示法(Mountain Climbing Sequence (MCS))。此表示法對於前面所提之影像中的位移、旋轉、以及放大或縮小等問題都可以達到不變 性的要求。另外，由於邊緣點的偵測就目前的研究經驗上並無法保証能夠找一物件的完整外形，因此我們也將嘗試在現有的外形特徵表示法下，克服物件外形不完整抽取的情況，甚至於在物件少部份被遮蔽的狀況下我們所提的系統也能得到良好的比對結果。其次是邊緣點與相鄰邊緣點關係的研究，在真實影像中，我們很難精確的偵測出影像中物件的完整的邊緣點，因此，我們計算影像中邊緣點與邊緣點間之關係，建立一方向-距離之長條圖，用以表示此真實影像，此表示法能克服查詢時所須面對的縮放及旋轉問題。
||Because of recent advances in computer technology and the revolution in the way information is processed, increasing interest has been developed in automatic information retrieval from huge databases. In particular, content-based image retrieval (CBIR) has become a hot research topic and, consequently, improving the technology for content-based querying systems is of more challenge.
The work of content-based image retrieval includes selection, object representation, and matching. A good image representation should meet some requirements, including invariance to translation, rotation, scaling and reversion, and sustaining deformation of query images.
In this dissertation, we proposed three robust and efficient methods for CBIR. First, an efficient and robust shape-based image retrieval system is proposed. We introduce a shape representation method, the Mountain-Climbing-Sequence (MCS), which can be used to make the retrieval to be invariant to translation, rotation, scaling, and reversion. Second, a new feature called orientation-distance histogram is introduced, and a CBIR system based on this feature is proposed. This system transforms the RGB color model of a given image to the HSV color model and detects edge points by using the H-components, and then evaluates the orientation-distance histogram for each of the detected edge points to form a feature vector. With normalization of feature vectors and the use of MCS sequences, this system is of invariance to scaling and rotation. Last, we collect a variety of features for representing real images, including orientation-distance histogram, local edge map, wavelet coefficients, color information, and intensity information, forming feature vectors of dimension up to one thousand. Over such a large set of features, we use an improved version of the AdaBoost algorithm to select the most important features indicated by the user, and so as to efficiently achieve effective retrieval results.
||List of Figures III
CHAPTER 1 Introduction 1
1.1 Motivation 1
1.2 Related Work 1
1.3 Organization of the Dissertation 4
CHAPTER 2 Shape-Based Study 5
2.1 Shape Representation 5
2.1.1 Polar Representation and Distance Sequences 5
2.1.2 Objects with Concavity/Convexity 7
2.1.3 Invariance to Transformations 8
2.2 Tackling Request of Invariance 11
2.2.1 Rotation Invariance 12
2.2.2 Scaling Invariance 12
2.2.3 Reversion Invariance 12
2.3 Shape Matching 13
2.4 Experimental Results and Comparison 15
2.4.1 Experiment 1—Tested on Sebastian et al. shape database 16
2.4.2 Experiment 2—Tested on Mokhtarian et al. shape database 18
CHAPTER 3 Edge-Based Study 20
3.1 Prompt Edge Detection 20
3.2 Feature Extraction 22
3.3 Feature Normalization 24
3.3.1 Scaling Invariance 24
3.3.2 Rotation Invariance 25
3.4 Similarity Measure 26
3.5 Experimental Results 27
CHAPTER 4 Combination of Various Features 30
4.1 Feature Description 30
4.2 Retrieval Method 34
4.3 Experimental Results 40
4.3.1 Performance Based on Each Type of Feature 42
4.3.2 Performance Based on the Combination of All Features 46
4.3.3 Modified Version versus the Conventional Adaboost 48
CHAPTER 5 Conclusion and Future Work 50
Publication List 56
Appendix A. Database of Shape Images 58
Appendix B. Database of Real Images 77
List of Figures
Figure.2-1. Distance d and angle θ of a contour point (x,y) related to the centroid (x, y).········ 6
Figure 2-2. Graph of polar equation for the contour shown in Figure 2-1. ································ 7
Figure 2-3. The nearest intersection points are extracted from each of objects A and B.··········· 8
Figure 2-4. The farthest intersection points are extracted from each of object B.······················ 8
Figure 2-5. Each of the objects in (a) and (b) has two maximal distances from the centroid.·· 10
Figure 2-6. Test images provided by Sebastian et al.······························································· 17
Figure 3-1. (a) The original image. (b) The edge points detected based on the V component. (c)
The edge points detected based on the H component. ····························································· 21
Figure 3-2. The 8 neighbors of the center point.······································································ 21
Figure 3-3. (a) points of 1-distance, (b) points of 3-distance. ·················································· 23
Figure 3-4. (a) An original image, (b) The orientation-distance histogram of (a).23Fiugre 3-5.
(a) The 50%-scaled-down version of 3-3(a), (b) The orientation-distance histogram of
3-5(a), (c) The normalized version of 3-4(b), (d) The normalized version of 3-5(b). ········ 25
Fiugre 3-5. (a) The 50%-scaled-down version of 3-3(a), (b) The orientation-distance histogram
of 3-5(a), (c) The normalized version of 3-4(b), (d) The normalized version of 3-5(b). ···· 25
Fiugre 3-6. (a) A rotated (by 90 degrees) version of the image in 3-4(a), (b) The
oreintation-distance histogram of 3-6(a), (c) The shifted and normalized result of 3-4(b)
with s = 3, (d) The shifted and normalized result of 3-6(b) with s = 12.···························· 26
Figure 3-7. Examples of query results of (a) the method of F. Mahmoudi et al., and (b) our
method. ····························································································································· 29
Figure 3-8. The Pr-Re curves of (a) the method of F. Mahmoudi et al., and (b) our method. ·· 29
Figure 4-1. (a) A 3x3 window W3x3, (b) A window wLEP of binomial multipliers ····················· 32
Figure 4-2. (a) Gray values of pixel (x,y) and its 8 neghbors, (b) Neighborhood of (x,y) on the
edge map, (c) window Tx,y, (d) The convolution value of Tx,y with wLEP is 145.················ 32
Figure 4-3. HSV color space. ·································································································· 34
Figure 4-4. The operation of the modified Adaboost. (a) The table g with the classification
error rates err, (b) The table gs with error rates es for the best b features evaluated and
selected in Steps 3 and 4, (c) The table G evaluated in Step 5, (d) The normalized weights
and weighted errors evaluated in Step 5, (e) The final features (or weak classifiers) with
their original indices selected in Step 6. ············································································ 40
Figure 4-5. Various types of testing images.············································································ 42
Figure 4-6. The precision rate and recall rate based on OD feature. ········································ 43
Figure 4-7. The precision rate and recall rate based on LEP feature.······································· 44
Figure 4-8. The precision rate and recall rate based on Wavelet Energy feature.····················· 44
Figure 4-9. precision rate and recall rate based on color-based feature. ·································· 45
Figure 4-10. precision rate and recall rate based on intensity feature. ····································· 46
Figure 4-11. precision rate and recall rate based on combination of all features.····················· 47
Figure 4-12. The average precision rate and recall rate based on different features.················ 48
Figure 4-13. The precision and recall rates on various values of T. ········································· 48
Figure 4-14. Comparison of precision and recall rates of the proposed modified version and
conventional Adaboost. ····································································································· 49
||. J. Shanbehzadeh, A. M. E. Moghadam, and F. Mahmoudi, “Image indexing and retrieval techniques: past, present and next”, Proceedings of the SPIE: Storage and Retrieval for Multimedia Database Vol. 3972, San Jose, California, USA, 2000, pp. 461-470.
. D. Zhang and G. Lu, “Review of shape representation and description techniques”, Pattern Recognition 37, 2004, pp. 1-19.
. T. Wang, Y. Rui, and J. G. Sun, “Constraint based region matching for image retrieval”, International Journal of Computer Vision, 2003, pp. 37-45.
. J. Peng, “Multi-class relevance feedback content-based image retrieval”, Computer Vision and Image Understanding 90, 2003, pp. 42-67.
. QBIC(TM) -- IBM's Query By Image Content, http://wwwqbic.almaden.ibm.com/.
. J. H. Chang, K. C. Fan, and Y. L. Chang, “Multi-modal gray-level histogram modeling and decomposition”, Image and Vision Computing 20, 2002, pp. 203-216.
. R. Brenulli and O. Mich, “Histograms analysis for image retrieval”, Pattern Recognition 34, 2001, pp. 1625-1637.
. H. Wei and D. Y. Y. Yun, “Illumination-invariant image indexing using directional gradient angular histogram”, Proceedings of the IASTED International Conference Computer Graphics and Imaging, Honolulu, Hawaii, USA, 2001, pp. 13-16.
. Y. Chen, X. Zhou, and T. Huang, “One-class SVM for learning in image retrieval”, Proceedings of the IEEE International Conference on Image Processing, Thessaloniki, Greece, 2001, pp. 815-818.
. F. Mahmoudi, J. Shanbehzadeh, and A. M. Eftekhari-Moghadam, “Image retrieval based on shape similarity by edge orientation autocorrelogram”, Pattern Recognition 36, 2003, pp. 1725-1736.
. T. Bernier and J. A. Landry, “A new method for representing and matching shapes of natural objects”, Pattern Recognition 36, 2003, pp. 1711-1723.
. J. Zhang, X. Zhang, H. Krim, and G. G. Walter, “Object representation and recognition in shape spaces”, Pattern Recognition 36, 2003, pp. 1143-1154.
. H. Nishida, “Structural feature indexing for retrieval of partially visible shapes”, Pattern Recognition 35, 2002, pp. 55-67.
. A. Bonnassie, F. Peyrin, and D. Attali, “A new method for analyzing local shape in three-dimensional images based on medial axis transformation”, IEEE Transactions on Systems, Man, and Cybernetics-Part B, Cybernetics 33, 2003, pp. 700-705.
. P. Yushkevich, P. Fletcher, S. Joshi, A. Thall, and S. M. Pizer, “Continuous medial representations for geometric object modeling in 2D and 3D”, Image and Vision Computing 21, 2003, pp. 17-27.
. C. D. Ruberto, “Recognition of shapes by attributed skeletal graphs”, Pattern Recognition 37, 2004, pp. 21-23.
. M. Stricker and M. Orengo, “Similarity of color images”, SPlE Storage and Retrieval for Image and Video Databases, 1995, pp. 381-392.
. M. Swain and D. Ballad, “Color indexing,” International Journal of Computer Vision, Vol. 7, No. I, 1991, pp. 11-32.
. G. Lu and J. Phillips, “Using Perceptually Weighted Histograms for Colour-based Image Retrieval”, Proceedings of IEEE International Conference on Signal Processing, 1998, pp. 1150-1153.
. R.L. Kashyap and A. Khotanzed, “A model-based method for rotation invariant texture classification”. IEEE Transaction PAMI (8), 1986, pp. 472–481.
. M. Leung and A. M. Peterson, “Scale and rotation invariant texture classification”, Proceedings of the International Conference on Acoustics, Speech and Signal Processing, 1991, pp. 461-465.
. M. S. Choi and W. Y. Kim, “A novel two stage template matching method for rotation and illumination invariance”, Pattern Recognition 35, pp. 119-129, 2002.
. H. Araujo and J. M. Dias, “An introduction to the Log-polar mapping”, Proceedings of Cybernetic Vision, Second Workshop, pp. 139-144, 1996.
. D. G. Kendall, “Shape manifolds, Procrustean metrics, and complex projective spaces”, Bulletin London Mathematics Society 16, pp. 81-121, 1984.
. T. B. Sebastian, P. N. Klein, and B. B. Kimia, “Recognition of shapes by editing shock graphs”, Proceedings of the 8th IEEE International Conference on Computer Vision, ICCV 1, pp. 755-762, 2001.
. F. Mokhtarian, S. Abbasi, and J. Kittler, “Robust and efficient shape indexing through curvature scale space”, Proceedings of the 6th British Machine Vision Conference, BMVC ‘96, Edinburgh, UK, 1996, pp. 53-62.
. H. J. Lin, Y. T. Kao, S. H. Yen, and C. J. Wang, “A study of shape-based image retrieval”, Proceedings of the 6th International Workshop on Multimedia Network Systems and Applications, 2004, pp. 118-123.
. H. J. Lin, Y. T. Kao, “A prompt contour detection method”, International Conference on the Distributed Multimedia Systems, 2001.
. R. G. Rafel, E. G. Richard Woods, Digital Image Processing.
. F. Stein and G. Medioni, “Structural indexing: efficient 2-D object recognition”, IEEE Transactions on Pattern Analysis and Machine Intelligence 14 (12), 1992, pp. 1198-1204.
. K. K. Yu and N. C. Hu, “Two-dimensional gray-level object recognition using shape-specific points”, Journal of the Chinese Institute of Engineers 24(2), 2001, pp. 245-252.
. Y. Freund and R. E. Schapire. “A decision-theoretic generalization of online learning and an application to boosting”, Journal of Computer and Systems 55(1), 1997, pp. 119-139.