EIGEN CONJUGATION FOR SHOT BOUNDARY DETECTION

MANJUNATH S.1*, GURU D.S.2
1Department of Studies in Computer Science, Manasagangotri, University of Mysore, Mysore 570 006, Karnataka, India
2Department of Studies in Computer Science, Manasagangotri, University of Mysore, Mysore 570 006, Karnataka, India
* Corresponding Author : manju_uom@yahoo.co.in

Received : 06-11-2011     Accepted : 09-12-2011     Published : 12-12-2011
Volume : 3     Issue : 4       Pages : 241 - 244
Int J Mach Intell 3.4 (2011):241-244
DOI : http://dx.doi.org/10.9735/0975-2927.3.4.241-244

Conflict of Interest : None declared

Cite - MLA : MANJUNATH S. and GURU D.S. "EIGEN CONJUGATION FOR SHOT BOUNDARY DETECTION." International Journal of Machine Intelligence 3.4 (2011):241-244. http://dx.doi.org/10.9735/0975-2927.3.4.241-244

Cite - APA : MANJUNATH S., GURU D.S. (2011). EIGEN CONJUGATION FOR SHOT BOUNDARY DETECTION. International Journal of Machine Intelligence, 3 (4), 241-244. http://dx.doi.org/10.9735/0975-2927.3.4.241-244

Cite - Chicago : MANJUNATH S. and GURU D.S. "EIGEN CONJUGATION FOR SHOT BOUNDARY DETECTION." International Journal of Machine Intelligence 3, no. 4 (2011):241-244. http://dx.doi.org/10.9735/0975-2927.3.4.241-244

Copyright : © 2011, MANJUNATH S. and GURU D.S., Published by Bioinfo Publications. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution and reproduction in any medium, provided the original author and source are credited.

Abstract

In this paper, we address the problem of shot boundary detection, which is an essential pre-processing step in video analysis applications. We present a novel model for detection of shot boundaries based on eigen conjugation of adjacent frames. Also, in this paper we present a method of classifying the shot boundaries using template matching technique. Experimentation is carried out on two different categories of videos such as entertainment videos and sports videos.

Keywords

Shot boundary detection, Eigen conjugation, Dynamic time wrapping, Shot boundary classification.

Introduction

A video is a three dimensional signal in which first two dimensions represent spatial coordinates of the frame and the third dimension represents the time domain of the signal. To analyze and understand the video content it is necessary to detect the shots of a video as they form the basic units of a video structure. Shots are nothing but a set of frames with visually similar content captured within an interval of time. The shots are combined using different transition techniques such as cut, dissolve, wipe, fade in, fade out etc to form a scene. Identification of a shot involves detection of shot boundaries i.e., the frames or transitions which are responsible to combine two shots. Among all transitions detection of cuts has been somewhat successfully tackled but detection of gradual transitions types is treated as a difficult problem [1] .
Shot boundaries are detected either by analyzing the continuity or discontinuity of visual information in three dimensional signals. The continuity or discontinuity of visual information of a video is obtained by finding similarity or dissimilarity between or across the visual information of the frames and it is represented in the form of an one dimensional signal. Whenever there is a significant change in the one dimensional signal, the shot boundaries are declared.
The challenging issue in shot boundary detection is selection of appropriate frame representation methods which can represent a three dimensional signal by an equivalent one dimensional signal, such that changes in a video signal is replicated in a lower order one dimensional signal. The frame representative methods can be classified into following categories i.e., Color features based approaches, edge feature based approaches, interest point based approaches, motion feature based approaches and transformation based approaches.
In color feature based approaches, color features such as color histogram, color moments, color anglogram etc are used for frame representation [2-6] . Computation of a color histogram descriptor depends on selection of type of color space, color quantization and number of bins to compute the histogram. Selecting appropriate specifications for computation of a color histogram is a crucial and cumbersome task. The major limitations of these approaches are the problems of illumination variation and motion-induced false alarm [7] . Specifically, false alarm is more in case of gradual transitions and also it is observed that two different frames can have similar color distribution which may also reduce the rate of detection of hard cuts. The methods are dependent on the object color while ignoring its shape and texture. Further, they are highly sensitive to noisy inference in addition to being high dimension. To tackle these problems edge based approaches were proposed in literature [8-10] . In edge based approaches, study of edge features such as ratio of incoming and outgoing edges, edge histograms are used to detect the shot boundaries.
In the literature we can also find attempts based on interest point [11-12] , in which, the interest points such as SIFT points, dominant points, cloud points are used to compare the adjacent frames. These approaches are sensitive to high motion videos and in order to tackle this problem motion features were used to detect the shot boundaries [12-18] . On the other hand there are approaches, which transform frames into different space and features are extracted in the transformed space [19-26] .
In summary, it should be noticed that some of the aforementioned well-known methods are based on low-level features of frames such as color, motion or based on the study of the objects or based on transformation of frames onto different space. Among all, the transformation methods transform frames to different space and representatives of frames are extracted in transformed space. The advantage in working in transformed space is that they are less susceptible to noise and effective frames representatives can be extracted. However, there is a tradeoff in time taken to transform the frames to another space with the advantage of extracting effective representatives with low computational time. In this regard we have transformed video frames to eigen space, we compute the eigen conjugation of adjacent frames and eigen conjugates of adjacent frames are compared to detect the shot boundaries.
The organization of the paper is as follows, section 2 presents an overview of eigen conjugation along with the proposed model to detect the shot boundaries. A method of classification of shot boundaries is presented in section 2. Section 3 presents experimental results on two different datasets. The paper is concluded in section 4.

Eigen Conjugation for Shot Boundary detection: Proposed Model

In this section we present a novel approach of shot boundary detection using the concept of eigen conjugation. In [27] proposed eigen conjugation based similarity function to estimate the similarity between two square matrices. This has been exploited in this work to estimate similarity between adjacent frames of videos. Further characteristics of the similarity values have been exploited to detect and classify the shot boundaries. First, we present an overview on concept of eigen conjugation to estimate similarity between two adjacent frames. Later, we present the characteristics of similarity values obtained by eigen conjugation approach and exploitation of the same to detect the shot boundaries.

Eigen Conjugation: An Overview

An Eigen conjugation of two square matrices A and B is a double combination of the eigen values and eigen vectors of A with the original matrix B; and those eigen vectors and eigen values of B with A. As a result of this conjugation, there are two square matrices and , both composed by normalized column vectors.
Let be the eigen values and let be the matrices of eigen vectors of matrices A and B. We know that

(1)

(2)

Then, the matrices and , known as the eigen conjugation of A and B, are defined as

(3)

(4)

Each column vector of at position ā€˜iā€™ has a relation with a vector of VA at position ā€˜iā€™ and it is same in case of and VB. Hence to compute the distance between the matrices A and B we compute the distance between each of the column vectors from VA and resulting into a sequence of distance values referred by d1, and the distances from VB and into a sequence called d2. If matrices A and B are similar, then the similarity between the sequences d1 and d2 will be very high. The sequences d1 and d2 are calculated as follows,
Given two matrices K and L of size , both composed by column vectors such as

K = [ k(1) k(2) ... k(m) ] (5)

L = [ l(1) l(2) ... l(m) ] (6)

The operator is defined as

K L = [ e(k(1), l(1)) e(k(2), l(2)) ... e(k(m), l(m)) ] (7)

where e(k,l) is the Euclidean distance between vectors K and L given by

(8)

The resultant is a sequence that contains the distance between the consecutive pairs of column vectors from both matrices. The d1 and d2 for matrices A and B are obtained as follows

d1 = VA (9)

d2 = VB (10)

Both sequences characterize the effect of the eigen conjugation of A and B. The original matrices A and B are similar then the sequences d1 and d2 will also be similar,

i.e., A Bd1 d2 (11)

Dynamic time warping (DTW) can be used to measure the approximate the distance between sequences represented by d1 and d2 to estimate the similarity between two matrices A and B.

(12)

Eigen Conjugation for Shot Boundary detection

In this work we exploit the concept of eigen conjugation for shot boundary detection. Let Fi and Fj be the ith and jth frames (preferably adjacent frames) of a video. Eigen conjugation of two frames Fi and Fj is computed as explained in previous section. Let and be the eigen conjugates of Fi and Fj. The Euclidean distance between all pair of eigen column vectors of frame Fi and its eigen conjugate is computed and the distance values are sequenced to form sequence referred by d1. Similarly d2 between Fj and is also computed. We compute the distance between the two sequences d1 and d2 using DTW as specified in Eq. (12). This yields in a continuity signal equivalent to three dimensional video signal. The obtained continuity signal will have its own peculiar characteristics during shot boundaries. This has been exploited in this work to detect and classify shot boundaries. The characteristics of continuity signal in case of different shot boundaries are discussed in next subsection which is used to detect and classify the shot boundaries.

Classification of Shot Boundaries

In this section we present the peculiar characteristics of the continuity signal in case of cut, dissolve, fade-in and fadeout transitions.

Cut Detection

Cuts are easy to distinguish from other types of shot boundaries. During cut transitions the adjacent frames which form shot boundaries will have a high rate of change in visual content. As a result, the cut will have single isolated local minima. In [Fig-1] the local minima with high steep peaks reflect hard cuts (marked by rectangle box).

Dissolve Detection

Dissolves will have distinct local minima, surrounded by steep flanks. This is because, the frames which participate in dissolve will be merging each other in a very smooth manner. The spread of the steep changes based on the number of frames used to combine two shots, the number of frames varies from minimum of three to more than 30 frames. Dissolve with three frames will have same behavior of hard cuts and dissolves with more number of frames will have broad steeps (marked with ellipse) as shown in [Fig-1]

Fade Detection

Fades are of two types, fade-in and fade out. Contrast to dissolves the fades will have reverse characteristics when eigen conjugation is used. In case of fadeouts the end frames of shots will be blended with black frames. In case of fade-in the shot starts with a set of black frames and then actual video frames will be slowly appeared. As a fade is entirely dependent on blending of black frames either in the beginning or ending of the shot, there will be a smooth change in the continuity or discontinuity of the signal having bell shape structure. If a video starts with a fade-in then the structure will be semi bell shaped structure and the same can be observed in [Fig-2] as the considered video starts with fade-in (double ellipse represents the fade).
To identify these characteristics of the continuity signal we have used template matching technique. We have considered 20 samples of each transition characteristics and a template is created by averaging it. Hence we have four templates for cut, dissolve and fade-in and fade-out. During shot boundary detection we compare these templates with the continuity signal at each frame and decision is taken whether the current location is a shot boundary or not. Simultaneously we also decide what type of transition it is.

Experimental Results

To corroborate the efficacy of the proposed method we have conducted an experimentation on two different types of videos such as entertainment video and sports videos collected from the World Wide Web. In order to measure the performance of the proposed method we have used hit rate (HR) and error rate (ER) as given in equations 13 and 14.

(13)

(14)

For experiment 1, we have considered entertainment video (Crazy frog video downloaded from youtube.com) which contains large motion having more number of transitions with a minimal number of cuts. This was particularly chosen to study the behavior of the proposed method on the video frames having large motion and the results are tabulated in [Table-1] .
In experimentation 2, we have considered sports (Cricket, tennis and archery) videos which contain maximum number of cuts than the transition frames. Also, some of the sports videos (cricket and tennis videos) contain considerably large motion. The obtained results are presented in [Table-2] .
The results tabulated in [Table-1] and [Table-2] indicates that the proposed algorithm is applicable to any video with any type of boundaries.

Conclusion

In this paper we have presented a method of detecting shot boundaries using eigen conjugation approach which is used to compare similarity between the matrices. Normally, most of the methods only detect shot boundaries without classifying the shot boundaries. In this work we have detected the shot boundaries and also we have classified the shot boundaries either as cut or as gradual transition (dissolve, fade-in fade-out). Experimentation is carried on two different set of videos such as entertainment and sports videos.

References

[1] Yuan J., Wang H., Xiao L., Zheng W., Li J., Lin F., and Zhang B. (2007) IEEE Transactions on Circuits and Systems for Video Technology, 168-185.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[2] Zhang H.J., Kankanhalli A. and Smoliar S.W. (1993) Multimedia Systems, 1(1), 10-28.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[3] Fan J., Elmagarmid A.K., Zhu X., Aref W.G., and Wu L. (2004) IEEE Transactions on Multimedia, 6(1), 70-86.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[4] Cooper M., Liu T and Rieffel E. (2007) IEEE Transactions on Multimedia, 9(3), 610-618.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[5] Le D.D., Satoh S.I., Ngo T.D., and Duong A. (2008) IEEE 10th Workshop on Multimedia Signal Processing, 702-706.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[6] Kucuktunc O., Gudukbay U., and Ulusoy O. (2010) Computer Vision and Image Understanding, 114(1), 125-134.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[7] Adjeroh D., Lee M.C., Banda N. and Kandaswamy U. (2009) Journal of Image and Video Processing, 1-13.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[8] Zabih R., Miller J. and Mai K. (1995) ACM International Conference on Multimedia, 189-200.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[9] Smeaton A.F., Gormely G., Gilvarry J., Tobin B., Marlow S. and Murphy N. (1999) Irish Machine Vision and Image Processing Conference, 45-62.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[10] Heng W.J. and Ngan K.N. (1999) IEEE International Confernece on Image Processing, 3, 289-293.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[11] Ardebilian M., Tu X., and Chen L. (2000) Journal of Visual Communication and Image Representation, 97-103.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[12] Chang Y., Lee D.J., Hing Yi., and Archibald J. (2008) EURASIP Journal on Video and Image Processing, 10 pages.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[13] Cherfaoui M. and Bertin C. (1995) In SPIE Conference on Digital Video Compression: Algorithms and Technologies, 2419, 38-47.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[14] Mann S. and Picard R.W. (1997) Image and Vision Computing, 17(7), 1281-1295.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[15] Bouthemy P., Gelgon M. and Ganansia F. (1999) IEEE Transactions on Circuits and Systems for Video Technology, 9(7), 1030-1044.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[16] Zugaj D. and Bouthemy P. (1999) European workshop on Content Based Multimedia Indexing.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[17] Ngo C.W., Ma Y.F. and Zhang H.J. (2005) IEEE Transactions on Circuits and Systems For Video Technology, 15(2), 296-305.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[18] Amel A.M., Abdessalem B.D. and Abdellatif M. (2010) Journal of Telecommunication, 2(1), 54-59.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[19] Han K.J. and Tewfik A.H. (1997) IEEE International Conference on Multimedia Computing and Systems, 2, 710-714.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[20] Filip J. and Haindl M. (2008) 19th Internal Conference on Pattern Recognition, 1-4.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[21] Amiri A. and Fathy M. (2009 (a)) EURASIP Journal on Advances in Signal Processing, 12 pages.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[22] Amiri A. and Fathy M. (2009 (b)) International Conference on Computational Science and Its Applications, 2(1), 780-790.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[23] Porter S.V., Mirmehdi M., and Thomas B.T., (2000) IAPR International Conference on Pattern Recogntiion, 3, 413-416.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[24] Koumaras H., Gardikis G., Xilouris G., Pallis E., and Kourtis A. (2006) Journal of Electronic Imaging, 15(2).  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[25] Punitha P. and Jose J.M. (2010) 16th International Multimedia Modeling Conference, 347-357.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[26] Manjunath S., Guru D.S., Suraj M.G. and harish B.S. (2011) ACM Compute.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[27] Gonzalez V.O., Yreta A.A., Apodaca J.M., and Moreno V.L. (2006) Proceedings of the Fifth Mexican International Conference on Artificial Intelligence.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

Images
Fig. 1- Plot of continuity signal of a sample video based on Eigen conjugation and DTW
Fig. 2- Plot of continuity signal of a sample video based on Eigen conjugation and DTW
Table 1- Results obtained for entertainment video
Table 2- Results obtained for sports video