Temporal segment based extraction and robust matching of video fingerprints

التفاصيل البيبلوغرافية
العنوان:	Temporal segment based extraction and robust matching of video fingerprints
Patent Number:	9,177,209
تاريخ النشر:	November 03, 2015
Appl. No:	11/958298
Application Filed:	December 17, 2007
مستخلص:	A computer implemented method, apparatus, and computer program product code for temporal, event-based video fingerprinting. In one embodiment, events in video content are detected. The video content comprises a plurality of video frames. An event represents discrete points of interest in the video content. A set of temporal, event-based segments are generated using the events. Each temporal, event-based segment is a segment of the video content covering a set of events. A time series signal is derived from each temporal, event-based segment using temporal tracking of content-based features of a set of frames associated with the each temporal, event-based segment. A temporal segment based fingerprint is extracted based on the time series signal for the each temporal, event-based segment to form a set of temporal segment based fingerprints associated with the video content.
Inventors:	Chang, Jane Wen (Lexington, MA, US); Natsev, Apostol Ivanov (Harrison, NY, US); Smith, John R. (New York, NY, US)
Assignees:	SINOEAST CONCEPT LIMITED (Wanchai, HK)
Claim:	1. A computer implemented method for temporal, event-based video fingerprinting, the computer implemented method comprising: extracting content-based features from video content, wherein the video content comprises a plurality of video frames and the content-based features are extracted from each frame of the plurality of video frames; detecting events in the content-based features, wherein each event of the events represents discrete points of interest in the video content; generating a set of temporal, event-based segments using the events, wherein each temporal, event-based segment is a segment of the video content covering a set of the events that span a set of at least two frames of the plurality of video frames and is aligned at an event boundary; deriving a time series signal from each temporal, event-based segment in the set of temporal, event-based segments using temporal tracking of the content-based features of multiple frames associated with the each temporal, event-based segment; and extracting a segment fingerprint based on the time series signal for the each temporal, event-based segment to form a set of temporal segment based fingerprints associated with the video content.
Claim:	2. The computer implemented method of claim 1 wherein the set of temporal segment based fingerprints are used to determine if a portion of a first video clip is derived from a same original content as a second video clip, the computer implemented method further comprising: comparing temporal segment based fingerprints for the first video clip with temporal segment based fingerprints generated for the second video clip; identifying matching event-based segments based on a similarity measure between a temporal segment-based fingerprint associated with the first video clip and a temporal segment based fingerprint associated with the second video clip to form a matching segment; collecting all matching segments between the first video clip and the second video clip to form a set of matching segments; selecting a subset of matching segments, wherein the subset of matching segments comprises matching segments associated with the first video clip that produces a good linear fit to matching segments associated with the second video clip; identifying an overall video match score for the first video clip and the second video clip based on the selected matching segments; and determining whether the first video clip is a near-duplicate of the second video clip using the overall video match score.
Claim:	3. The computer implemented method of claim 2 further comprising: comparing the overall video match score to a threshold score; and responsive to the overall video match score exceeding the threshold score, identifying the first video clip as a near-duplicate of the second video clip.
Claim:	4. The computer implemented method of claim 1 wherein the temporal segment based fingerprints are used for at least one of content-based search, concept detection, content categorization, summarization, filtering, routing, or targeted advertising.
Claim:	5. The computer implemented method of claim 1 wherein the content-based features comprises at least one of audio features and visual features, and wherein each event is detected based on at least one of video shot detection, scene changes, speaker changes, audio changes, frame intensity changes, or changes based on low-level content-based descriptors of color, texture, shape, edges, or motion.
Claim:	6. The computer implemented method of claim 1 wherein the temporal, event-based segments span a set of frames in the plurality of video frames, and wherein the set of frames covers only a subset of the video content.
Claim:	7. The computer implemented method of claim 1 wherein a first temporal, event-based segment associated with the video content overlaps with a second temporal, event-based segment associated with the video content, wherein the events describe a significant change of state in at least one of audio content of the video, visual content of the video, and semantic content of the video.
Claim:	8. The computer implemented method of claim 1 wherein the time series signal is based on at least one of temporal tracking of overall frame intensity, tracking of frame region-based intensity sequences, tracking of grid-based intensity sequences, and tracking of adjacent frame differences.
Claim:	9. The computer implemented method of claim 1 wherein the temporal segment based fingerprints are extracted from the segment time series based on at least one of uniform sampling, piece-wise linear approximation, Discrete Cosine Transform (DCT), Discrete Wavelet Transform (DWT), and Discrete Fourier Transform (DFT).
Claim:	10. The computer implemented method of claim 1 wherein a temporal segment based fingerprint comprises a fixed-dimensionality feature vector.
Claim:	11. The computer implemented method of claim 1 wherein an event describes a significant change of state in at least one of audio content of the video, visual content of the video, and semantic content of the video.
Claim:	12. A non-transitory computer program product for temporal, event-based video fingerprinting, the computer program product comprising: a computer readable medium; program code stored on the computer readable medium for extracting content-based features from video content, wherein the video content comprises a plurality of video frames and the content-based features are extracted from each frame of the plurality of video frames; program code stored on the computer readable medium for detecting events in the content-based features, wherein each event of the events represents discrete points of interest in the video content; program code stored on the computer readable medium for generating a set of temporal, event-based segments using the events, wherein each temporal, event-based segment is a segment of the video content covering a set of the events that span a set of at least two frames of the plurality of video frames and is aligned at an event boundary; program code stored on the computer readable medium for deriving a time series signal from each temporal, event-based segment in the set of temporal, event-based segments using temporal tracking of the content-based features of multiple frames associated with the each temporal, event-based segment; and program code stored on the computer readable medium for extracting a segment fingerprint based on the time series signal for each temporal, event-based segment to form a set of temporal segment based fingerprints associated with the video content.
Claim:	13. The computer program product of claim 12 wherein the set of temporal segment based fingerprints are used to determine if a portion of a first video clip is derived from a same original content as a second video clip, the computer program product further comprising: program code stored on the computer readable medium for comparing temporal segment based fingerprints for the first video clip with temporal segment based fingerprints generated for the second video clip; program code stored on the computer readable medium for identifying matching event-based segments based on a similarity measure between a temporal segment-based fingerprint associated with the first video clip and a temporal segment based fingerprint associated with the second video clip to form a matching segment; program code stored on the computer readable medium for collecting all matching segments between the first video clip and the second video clip to form a set of matching segments; program code stored on the computer readable medium for selecting a subset of matching segments, wherein the subset of matching segments comprises matching segments associated with the first video clip that produces a good linear fit to matching segments associated with the second video clip; program code stored on the computer readable medium for identifying an overall video match score for the first video clip and the second video clip based on the selected matching segments in the subset of matching segments; and program code stored on the computer readable medium for determining whether the first video clip is a near-duplicate of the second video clip using the overall video match score.
Claim:	14. The computer program product of claim 13 further comprising: program code stored on the computer readable medium for comparing the overall video match score to a threshold score; program code stored on the computer readable medium for identifying the first video clip as a near-duplicate of the second video clip in response to the overall video match score exceeding the threshold score.
Claim:	15. The computer program product of claim 12 wherein the content-based features comprises at least one of audio features and visual features, and wherein each event is detected based on at least one of video shot detection, scene changes, speaker changes, audio changes, frame intensity changes, or changes based on low-level content-based descriptors of color, texture, shape, edges, or motion.
Claim:	16. An apparatus for automatically detecting video piracy, the apparatus comprising: a bus system; a communications system coupled to the bus system; a memory connected to the bus system, wherein the memory includes computer usable program code; and a processing unit coupled to the bus system, wherein the processing unit executes the computer usable program code to extract content-based features from video content, wherein the video content comprises a plurality of video frames and the content-based features are extracted from each frame of the plurality of video frames, detect events in the content-based features, wherein each event of the events represents discrete points of interest in the video content; generate a set of temporal, event-based segments using the events, wherein each temporal, event-based segment is a segment of the video content covering a set of the events that span a set of at least two frames of the plurality of video frames and is aligned at an event boundary; derive a time series signal from each temporal, event-based segment in the set of temporal, event-based segments using temporal tracking of the content-based features of multiple frames associated with the each temporal, event-based segment; and extract a segment fingerprint based on the time series signal for each temporal, event-based segment to form a set of temporal segment based fingerprints associated with the video content, wherein detection of the near-duplicate is used for content-based video piracy detection.
Claim:	17. The apparatus of claim 16 wherein the processing unit further executes the computer usable program code to compare temporal segment based fingerprints for the first video clip with temporal segment based fingerprints generated for the second video clip; identify matching event-based segments based on a similarity measure between a temporal segment-based fingerprint associated with the first video clip and a temporal segment based fingerprint associated with the second video clip to form a matching segment; collect all matching segments between the first video clip and the second video clip to form a set of matching segments; select a subset of matching segments, wherein the subset of matching test segments produces a good linear fit to the matched video; identify an overall video match score for the first video clip and the second video clip based on the selected matching segments in the subset of matching segments; and determine whether the first video clip is a near-duplicate of the second video clip using the overall video match score.
Claim:	18. The apparatus of claim 17 wherein the processing unit further executes the computer usable program code to compare the overall video match score to a threshold score; and identify the first video clip as a near-duplicate of the second video clip in response to the overall video match score exceeding the threshold score.
Patent References Cited:	5953439 September 1999 Ishihara et al. 7194752 March 2007 Kenyon et al. 7375731 May 2008 Divakaran et al. 7912894 March 2011 Adams 2003/0185417 October 2003 Alattar et al. 2004/0098225 May 2004 Abe et al. 2004/0260930 December 2004 Malik et al. 2005/0021568 January 2005 Pelly et al. 2005/0213826 September 2005 Neogi 2006/0187358 August 2006 Lienhart et al. 2007/0253594 November 2007 Lu et al. 2008/0310731 December 2008 Stojancic et al. 2009/0052784 February 2009 Covell et al.
Other References:	“Filtering user-generated content”, Bit Player, pp. 1-2, retrieved Oct. 25, 2007 http://opinion.latimes.com/bitp[layer/2007/03/filtering—userg.html. cited by applicant “The content-recognition bakeoff, parts 1 and 2”, Bit Player, pp. 1-2, retrieved Oct. 25, 2007 http://opinion.latimes.com/bitp[layer/2007/09/the-content-rec.html. cited by applicant Liedtke, “YouTube unveils anti-piracy filters” Yahoo News, Oct. 15, 2007,pp. 1-2. cited by applicant Gentile, “Media, Web companies set copyright rules”, Yahoo News, Oct. 18, 2007, pp. 1-3. cited by applicant Hampapur et al., Comparison of Distance measures for Video Copy Detection, 2001 IEEE International Conference on Multimedia and Expo, 2001 IEEE, pp. 944-947. cited by applicant Law-To et al., “Video Copy Detection: a Comparative Study”, CIVR'07, Jul. 2007, Amsterdam, The Netherlands, 2007 ACM, pp. 1-8. cited by applicant Law-To et al., “Robust Voting Algorithm Based on Labels of Behavior for Video Copy Detection”, MM'06 Oct. 2006, Santa Barbara, California, 2006 ACM, pp. 946-955. cited by applicant Massoudi et al., “A Video Fingerprint Based on Visual Digest and Local Fingerprints”, 2006 IEEE, pp. 2297-2300. cited by applicant Li et al., “Content-Based Video Copy Detection with Video Signature”, 2006 IEEE, pp. 4321-4324. cited by applicant Shen et al., “Towards Effective Indexing for very Large Video Sequence Database”, SIGMOD 2005, Jun. 2005, Baltimore, Maryland, 2005 ACM. pp. 1-12. cited by applicant
Primary Examiner:	Flynn, Randy
Attorney, Agent or Firm:	Morgan, Lewis & Bockius LLP
رقم الانضمام:	edspgr.09177209
قاعدة البيانات:	USPTO Patent Grants

View record in USPTO Patent Grants

الوصف
الوصف غير متاح.