Automatic video object extraction

التفاصيل البيبلوغرافية
العنوان: Automatic video object extraction
Patent Number: 7,453,939
تاريخ النشر: November 18, 2008
Appl. No: 10/890355
Application Filed: July 13, 2004
مستخلص: Automatic video object extraction that defines substantially precise objects is disclosed. In one embodiment, color segmentation and motion segmentation are performed on a source video. The color segmentation segments the video by substantially uniform color regions thereof. The motion segmentation segments the video by moving regions thereof. The color regions and the moving regions are then combined to define the video objects. In varying embodiments, pre-processing and post-processing is performed to further clean the source video and the video objects defined, respectively.
Inventors: Pan, Jinhui (Beijing, CN); Li, Shipeng (Redmond, WA, US); Zhang, Ya-Qin (Bellevue, WA, US)
Assignees: Microsoft Corporation (Redmond, WA, US)
Claim: 1. A computer implemented method for extracting objects from a source video comprising: performing motion segmentation on a plurality of frames of the source video to segment the source video by moving regions thereof to define basic contours of the objects of the source video; performing color segmentation on a frame of the source video to segment the source video by substantially uniform color regions thereof to define more precise boundaries of the objects of the source video as measured against the basic contours of the objects of the source video as defined by the motion segmentation; and, combining at least the moving regions resulting from the motion segmentation and the substantially uniform color regions resulting from the color segmentation to define still more precisely the objects of the source video by, at least partially, deeming a substantially uniform color region to be part of a moving object if a percent of the substantially uniform color region that is assignable to one or more moving regions exceeds a predetermined threshold.
Claim: 2. The method of claim 1 , wherein the predetermined threshold is between fifty and sixty percent.
Claim: 3. A computer-readable medium having computer instructions stored thereon for execution by a processor to perform a method comprising: generating a plurality of masks from a source video having a plurality of frames, the plurality of masks including at least a color segmentation mask, a motion segmentation mask, and a frame difference mask; wherein the frame difference mask is generated from at least two frames of the plurality of frames without using another mask; and, combining the plurality of masks to define a plurality of objects of the source video; wherein generating the plurality of masks comprises generating at least the color segmentation mask to define substantially precise boundaries of the objects, the motion segmentation mask to define approximate boundaries of the objects, and the frame difference mask to correct errors within the motion segmentation mask; and wherein the plurality of objects of the source video resulting from the combining are more precise than the substantially precise boundaries of the objects defined by the color segmentation mask, which are more precise than the approximate boundaries of the objects defined by the motion segmentation mask.
Claim: 4. The medium of claim 3 , wherein generating the color segmentation mask comprises growing substantially uniform color regions of a frame of the plurality of frames of the source video.
Claim: 5. The medium of claim 3 , wherein generating the motion segmentation mask comprises generating the motion segmentation mask from the plurality of frames of the source video.
Claim: 6. The medium of claim 3 , wherein the method further comprises pre processing a frame of the plurality of frames of the source video to remove noise prior to generating the plurality of masks.
Claim: 7. The medium of claim 3 , wherein the method further comprises post processing the plurality of objects of the source video to remove noise from the objects.
Claim: 8. A computer comprising: a processor; at least one computer readable medium to store data representing: at least two frames of a plurality of frames of a source video, and, a plurality of objects extracted from the source video; and, a computer program executed by the processor from the at least one computer readable medium and designed to extract the plurality of objects from the source video by generating a plurality of masks from the source video and then combining the plurality of masks by, at least partially, deeming a substantially uniform color region to be part of a moving object if a percent of the substantially uniform color region that is assignable to one or more moving regions exceeds a predetermined threshold, the plurality of masks including: a motion segmentation mask that segments the video by substantially uniform color regions to define basic contours of the plurality of objects of the source video based on moving regions thereof, a color segmentation mask that segments the video by moving regions to define more precise boundaries, as compared to the basic contours, of the plurality of objects of the source video based on substantially uniform color regions thereof, and a frame difference mask.
Claim: 9. The computer of claim 8 , wherein the plurality of objects extracted from the source video are better defined than the precise boundaries of the plurality of objects as defined by the color segmentation mask, and the precise boundaries of the plurality of objects are more defined than the basic contours of the plurality of objects as, defined by the motion segmentation mask.
Claim: 10. The computer of claim 8 , wherein the program is further designed to pre process the at least two frames of the plurality of frames of the source video to remove noise.
Claim: 11. The computer of claim 8 , wherein the program is further designed to post process the plurality of objects extracted from the source video to remove noise.
Claim: 12. At least one computer-readable medium having computer instructions stored thereon for execution by a processor to transform a general purpose computer to a special purpose computer comprising: means for storing data representing: at least two frames of a plurality of frames of a source video, and, a plurality of objects extracted from the source video; and, means for: (a) generating a plurality of masks from the source video, the plurality of masks including: a color segmentation mask segmenting the source video by substantially uniform color regions to define substantially precise boundaries of the plurality of objects, a motion segmentation mask segmenting the source video by moving regions thereof to define approximate boundaries of the plurality of objects, and a frame difference mask to reflect differences between the at least two frames of the source video, and, (b) combining the plurality of masks to extract the plurality of objects from the source video by, at least partially, deeming a substantially uniform color region to be part of a moving object if a percent of the substantially uniform color region that is assignable to one or more moving regions exceeds a predetermined threshold.
Claim: 13. The at least one computer-readable medium of claim 12 , wherein the substantially precise boundaries of the plurality of objects defined by the color segmentation mask are more precise than the approximate boundaries of the plurality of objects defined by the motion segmentation mask, and the plurality of objects extracted from the source video are more precise than the precise boundaries of the plurality of objects defined by the color segmentation mask.
Claim: 14. The at least one computer-readable medium of claim 12 , wherein the means is further for pre processing the at least two frames of the plurality of frames of the source video to remove noise.
Claim: 15. The at least one computer-readable medium of claim 12 , wherein the means is further for post processing the plurality of objects extracted from the source video to remove noise.
Claim: 16. At least one computer-readable medium having computer instructions stored thereon for execution by a processor to perform a method comprising: performing motion segmentation on at least three frames of a plurality of frames of video to segment the video by moving regions and to thereby define basic contours of objects of the video; performing color segmentation on at least one frame of the plurality of frames of the video to segment the video by substantially uniform color regions and to thereby define more precise boundaries of the objects of the video as compared to the basic contours of the objects of the video as defined by the motion segmentation; and, combining the substantially uniform color regions resulting from the color segmentation and the moving regions resulting from the motion segmentation to further define the objects of the video by, at least partially, deeming a substantially uniform color region to be part of a moving object if a percent of the substantially uniform color region that is assignable to one or more moving regions exceeds a predetermined threshold.
Claim: 17. The at least one computer-readable medium of claim 16 , wherein the performing motion segmentation comprises determining a combination motion mask based on a plurality of individual motion masks and responsive to at least one threshold.
Claim: 18. At least one computer-readable medium having computer instructions stored thereon for execution by a processor to perform a method comprising: generating a color segmentation mask that segments the video by substantially uniform color regions to define contours of objects of a video to a first precision; generating a motion segmentation mask that segments the video by moving regions to define the contours of the objects of the video to a second precision; generating a frame difference mask that reflects differences in the video between a first frame and a second frame of the video on a per pixel basis responsive to a predetermined threshold; and combining the color segmentation mask, the motion segmentation mask, and the frame difference mask to define the objects of the video by, at least partially, deeming a substantially uniform color region to be part of a moving object if a percent of the substantially uniform color region that is assignable to one or more moving regions exceeds a predetermined threshold.
Claim: 19. The at least one computer-readable medium of claim 18 , wherein the objects of the video defined by the combining of the color segmentation mask, the motion segmentation mask, and the frame difference mask are yet more precise than the basic contours of the objects of the video defined by the moving regions and the more precise boundaries of the objects of the video defined by the substantially uniform color regions.
Claim: 20. The at least one computer-readable medium of claim 18 , wherein the combining of the color segmentation mask, the motion segmentation mask, and the frame difference mask to define the objects of the video comprises: combining the color segmentation mask and the motion segmentation mask to generate a first intermediate mask; combining the color segmentation mask and the frame difference mask to generate a second intermediate mask; and combining the first intermediate mask and the second intermediate mask to produce a final mask.
Current U.S. Class: 37524/008
Patent References Cited: 5479218 December 1995 Etoh
5608458 March 1997 Chen et al.
5936671 August 1999 Van Beek et al.
5995668 November 1999 Corset et al.
6005625 December 1999 Yokoyama
6035060 March 2000 Chen et al.
6075875 June 2000 Gu
6141434 October 2000 Christian et al.
6266443 July 2001 Vetro et al.
6337917 January 2002 Onural et al.
6348918 February 2002 Szeliski et al.
6400846 June 2002 Lin et al.
6421090 July 2002 Jiang et al.
6625333 September 2003 Wang et al.
7143434 November 2006 Paek et al.
WO 98/33323 July 1998

Other References: Gu & Lee, Tracking of Multiple Semantic Video Objects for Internet Applications, IS&T/SPIE Conference on Visual Communications & Image Processing 99, SPIE vol. 3563, pp. 806-820, Jan. 1999. cited by other
Alatan, Onural, Wolburn et al., Image Sequence Analysis for Emerging Interaction Multimedia Services, IEEE Transactions on Circuits and Systems for Video Technology, vol. 8, No. 7, Nov. 1998. cited by other
Primary Examiner: Vo, Tung
Attorney, Agent or Firm: Lee & Hayes, PLLC
رقم الانضمام: edspgr.07453939
قاعدة البيانات: USPTO Patent Grants