One-Shot Video Object Segmentation With Iterative Online .

3y ago

47 Views

4 Downloads

2.12 MB

5 Pages

Last View : 1d ago

Last Download : 3m ago

Upload by : Casen Newsome

Report this link

Download PDF

Transcription

One-Shot Video Object Segmentation with Iterative Online Fine-TuningAmos NewswangerUniversity of RochesterRochester, NY 14627Chenliang XuUniversity of RochesterRochester, NY r.eduAbstractSemi-supervised or one-shot video object segmentationhas attracted much attention in the video analysis community recently. The OSVOS model [1] achieves state of theart results on the DAVIS 2016 dataset by first fine-turning aCNN model on the first pre-segmented frame of a video, andthen independently segmenting the rest of the frames in thatvideo. However, the model lacks the ability to learn newinformation about the object as the object evolves throughout the video displaying features that were not present inthe first frame. To address this issue, we propose an iterative online training method whereby the model is fine-tunedon the first frame, segments several consecutive frames independently, and then gets updated on its own output segmentation. This process is repeated until all frames of avideo are segmented. To segment multiple similar objectsin a video, we use an object tracker to filter the output ofthe individually trained CNN object models before beingused for iterative fine-turning. This reduces the possibility for error propagation, and helps the model increase itsdiscriminative power as it is being iteratively fine-tuned.Our method shows improvement over the standard OSVOSmodel on both DAVIS 2016 and 2017 datasets.Figure 1. The first frame contains relatively little informationabout the object, and as a result, the OSVOS model fails to segment it correctly towards the end. However, earlier correct segmentations contain useful information about the object that can beused to further train the modeltermediate values taken from the VGG network to producea segmentation mask. This network is further trained offlineon the training videos of DAVIS 2016 dataset to learn a general concept of foreground objects, and hence, it is calledthe parent network. To perform video object segmentationon a given testing video, the parent network is first finetuned on the pre-segmented ground-truth frame to learn theappearance features of the object in question, and then isused to independently segment the rest of the frames in thevideo. Although this approach has many desirable qualities,it lacks the ability to learn new information about the objectas it evolves throughout the video. This reduces its performance on sequences where the initial frame lacks information about the object that becomes important later on inthe sequence. For instance, Table 1 shows the scooter-blacksequence, in which the first frame contains relatively littleinformation about the object. As the object gets closer tothe camera, the network fails to segment it properly. However, segmentations leading up to the failure are correct, andcontain information about the object that could be used tocorrect the failure.We present a method for iterative online fine-tuning ofthe OSVOS network. As shown in Fig. 2, we first fine-1. IntroductionIn recent years, Convolutional Neural Networks (CNNs)have achieved state of the art results in many computer vision tasks, such as image classification [8] and object detection [2]. Video object segmentation, or the separationof an object from its background in a video sequence, isa related task that has also come to be dominated by deeplearning methods [3, 9, 1, 4]. Among them, the One-ShotVideo Object Segmentation (OSVOS) model, a fully convolutional network introduced in [1], achieves state of theart performance in the DAVIS 2016 competition [5].The OSVOS model is based on the VGG [7] network,which is pre-trained on a generic task of image classificationon the ImageNet. The network performs convolution on in1

Iterative Training Bounding Box filtering(4) (1)IT BoxOSVOS Model(3)IT Box(2)ContourNetworkFigure 2. Overview of our method. On the left: (1) The OSVOS model is first trained on the ground truth segmentation of the first frame. (2)This model is used to segment some number of frames. (3) These segmentations are filtered using a bounding box tracker. (4) The filteredsegmentations are added to the training set, and the model is further fine-tuned. The diagram on the left shows how we independentlymanage each object in in a multi-object mask, and then combine the results and snap the boundaries to a contour.al. Our method based on iterative fine-tuning adapts thenetwork to the object as it evolves throughout the sequence.However, it also presents the possibility to propagate errorsmade early on in the segmentation process. To mitigate thisproblem, we experiment with several ways of filtering theoutput of the network before being used for fine-tuning inSec. 3.For the DAVIS 2017 dataset, we adapt our method tohandle multiple objects. We use the same parent network asthat for the DAVIS 2016 dataset, but train it for additional10,000 iterations on the DAVIS 2017 TrainVal set, usingthe merged binary mask as the ground truth, such that themodel has a better idea of DAVIS 2017 objects. To dealwith the multiple objects in a same video, we first split themulti-object mask into separate binary masks for each object. We then run our method on each mask independentlyand get a probability map for each object, which we mergeinto a single multi-object mask by taking the maximum output value of each model for each pixel. The mask is thenfurther refined by snapping the boundaries to contours generated by the same contour network used by Caelles, et al.We find that because in many cases the DAVIS 2017 datasethas multiple objects with similar appearances, the OSVOSmodel has a hard time distinguish between them. To mitigate this problem, we use the OpenCV KCF object bounding box tracker to filter the output segmentation before being used to iteratively fine-tune the model. This reducethe possibility of error propagation, and helps the modelincrease its discriminatory ability as it is being iterativelyfine-tuned.tune the OSVOS parent network on the first frame of thevideo. We then use this model to independently segmentsome number of frames. These frames are then used to further fine-tune the network. This process is repeated until allframes of the video are segmented. We evaluate our methodon both the DAVIS 2016 and 2017 datasets. As shown inFig. 2, we deal with the multiple object masks in the DAVIS2017 dataset by first separating them into separated binarymasks and running our method independently on each one,and finally combining the results by taking the maximumoutput of all the models and snapping the boundaries to acontour.We evaluate our method with three metrics: Intersectionover-Union (IoU or J), contour accuracy (F) and temporalstability (T). We compare the performance of our methodon each sequence in the DAVIS 2016 validation set to theperformance of the standard OSVOS model. Furthermore,we perform additional evaluations on DAVIS 2017 videoswhere a single video contains multiple objects. Our methodshows improvement over the standard OSVOS model onboth DAVIS 2016 and 2017 datasets.2. MethodOur method is straightforward. We use the parent network provided by Caelles, et al. [1], which is trained for50,000 iterations on the DAVIS 2016 dataset (augmented bymirroring and zooming) with Stochastic Gradient Descentand momentum of 0.9. For the online training, we fine-tunethe model for 300 iterations on the first frame of the videoto train the model to recognize the specified object in question. We then use this model to independently segment thenext 10 frames in the sequence. These 10 frames are thenadded to the training set, and the model is fine-tuned for 100iterations on them. This process is repeated every 10 framesuntil all the frames in the sequence are segmented. To refine the segmentation, we snap the boundaries to contoursgenerated by the same contour network used by Caelles, et3. Experiments3.1. DAVIS 2016Our first set of experiments was performed on the DAVIS2016 dataset [5], which contains 50 video sequences, eachwith one object segmented in all the frames at pixel level.Our main metrics are Intersection-over-Union (IoU or J)2

Figure 3. Relative difference in IoU between the normal OSVOS model and our best performing method (IT) on DAVIS 2016.Table 1. DAVIS 2016 Validation Results (top two results are bold)and Contour Accuracy (F). We mainly compare our results to the state of the art results obtained by the OSVOSmodel [1] in the DAVIS 2016 competition.Table 1 shows the overall results on the DAVIS 2016validation set. Our method performs slightly better in allmetrics. Figure 3 shows the relative performance for eachsequence, and reveals that most of the gains come from relatively few sequences, while the accuracy on the majorityof the sequences is slightly reduced. The most improvedsequence (drift-straight, shown in Fig. 4) only displays thefront side of the car in the initial frame. As the sequenceprogresses, the broad side of the car is shown, and then theback side. Similarly, the second and third most improvedsequences display objects at an angle in the first frame anddisplay more and more features as the sequence progresses.This demonstrates the methods ability to pick up new features as the model is iteratively trained.On the other end of the spectrum, the most harmed sequence (bmx-tree, show in Fig. 5) shows the shortcomingsof the method. The OSVOS model picks up many falsepositives in the bmx-trees sequence and the iterative training method propagates these error. This same effect canbe seen in the other sequences, though to a lesser extent.To mitigate this issue, we experimented with several waysof filtering the segmentation before being used for iterativetraining. The simplest solution is to only train the modelon the largest blob (shown as IT LB in Table 1), with theassumption that the largest blob is most likely to be the correct object. For some sequences, this method works well,but fails in many cases because the largest blob may not bethe correct object, or the correct segmentation may not bea continuous blob. We also experimented with using theOpenCV KCF bounding box tracker to filter the segmenta-JFMeasureMean ( )Recall ( )Decay ( )Mean ( )Recall ( )Decay ( 0910.8090.9340.124IT LB0.7940.9260.1380.8060.9370.152IT Box0.7770.9110.1580.7990.9150.157tion by setting everything outside of the box to zero (shownas IT Box in Table 1). However, this method also failsto improve the results due to the poor performance of thetracker on the DAVIS 2016 validation set.3.2. DAVIS 2017Our second set of experiments was performed on theDAVIS 2017 data set, which contains 150 sequences (90in the TrainVal set, 30 in the Test Dev set, and 30 in the TestChallenge set) [6]. Each sequence has multiple objects segmented at pixel accuracy for all frames in the sequence. Themetrics used to evaluate the results are the same as thoseused on the DAVIS 2016 dataset.Table 2 shows our results on the Test Challenge datasetfor three different methods. IT stands for iterative trainingand Box stands for the use of the OpenCV KCF boundingbox tracker for filtering the segmentation before being usedfor iterative training. The first thing we found is that theaccuracy on the DAVIS 2017 dataset is much lower than onthe DAVIS 2016 dataset. This could be for several reasons.Many of the DAVIS 2017 sequences contain objects thatlook very similar, which could present a challenge for heOSVOS model, given that it has no information about mo3

Figure 4. Comparison of different methods drift-straight from the DAVIS 2016 dataset. In order: OSVOS, IT, IT LB, IT Box.Figure 5. Most harmed sequence (OSVOS on top, IT on bottom) from the DAVIS 2016 dataset.Table 2. DAVIS 2017 Test Challenge Results (top two results arein bold). The overall metric is the mean of J and F over all objectinstancesJFFigure 6. Comparison between OSVOS(left) and IT Box (Right)on DAVIS 2017 videos.tion or temporal continuity. In addition to this, the 2017dataset has smaller objects than the 2016 dataset, whichpresent more opportunities for false positives. Because ofthe increase in false positives, simply applying iterativetraining causes excessive error propagation and reduces theaccuracy compared to the standard OSVOS model. To mitigate this problem, we used the OpenCV KCF bounding boxtracker to filter the output segmentation before being usedfor iterative training. This resulted in an improvement overthe standard OSVOS model. Figure 6 shows two exampleswhere iterative training improves the results. Notably, inthe varanus-tree sequence, the model learns to not segmentthe leaves that appears in the background towards the end ofthe sequence, which demonstrates the added discriminatorypower that the bounding box adds.MeasureOverall ( )Mean ( )Recall ( )Decay ( )Mean ( )Recall ( )Decay ( 4480.4790.2860.4940.5240.291IT LB0.5000.4810.5330.2220.5190.5760.240IT Box0.5090.4900.5510.2130.5280.5830.237However, the method is also prone to propagate errors madeearly on in the process. Future work may involve findingways to reduce the potential for error propagation and learning an automatic model to decide when to update the objectmodel throughout the video.References[1] S. Caelles, K.-K. Maninis, J. Pont-Tuset, L. Leal-Taixé,D. Cremers, and L. Van Gool. One-shot video object segmentation. In Computer Vision and Pattern Recognition (CVPR),2017.[2] R. Girshick, J. Donahue, T. Darrell, and J. Malik. Regionbased convolutional networks for accurate object detectionand segmentation. IEEE Transactions on Pattern Analysis andMachine Intelligence, 38(1):142–158, 2016.[3] S. D. Jain, B. Xiong, and K. Grauman. Fusionseg: Learningto combine motion and appearance for fully automatic segmention of generic objects in videos. In IEEE Conference onComputer Vision and Pattern Recognition, 2017.[4] A. Khoreva, F. Perazzi, R. Benenson, B. Schiele, andA. Sorkine-Hornung. Learning video object segmentationfrom static images. Technical report, arXiv:1612.02646,2016.4. ConclusionIn this paper, we show that iterative training provides away to learn more information about an objects as it evolvesthrough a sequence, and that our method shows an improvement over the state of the art on the DAVIS 2016 dataset,and over the standard OSVOS model on the 2017 dataset.4

[5] F. Perazzi, J. Pont-Tuset, B. McWilliams, L. Van Gool,M. Gross, and A. Sorkine-Hornung. A benchmark datasetand evaluation methodology for video object segmentation. InComputer Vision and Pattern Recognition, 2016.[6] J. Pont-Tuset, F. Perazzi, S. Caelles, P. Arbeláez, A. SorkineHornung, and L. Van Gool. The 2017 davis challenge on videoobject segmentation. arXiv:1704.00675, 2017.[7] K. Simonyan and A. Zisserman.Very deep convolutional networks for large-scale image recognition. CoRR,abs/1409.1556, 2014.[8] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov,D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeperwith convolutions. In IEEE Conference on Computer Visionand Pattern Recognition, 2015.[9] P. Tokmakov, K. Alahari, and C. Schmid. Learning videoobject segmentation with visual memory. Technical report,arXiv:1704.05737, 2017.5

One-Shot Video Object Segmentation with Iterative Online Fine-Tuning Amos Newswanger University of Rochester Rochester, NY 14627 anewswan@u.rochester.edu Chenliang Xu University of Rochester Rochester, NY 14627 chenliang.xu@rochester.edu Abstract Semi-supervised or one-shot video object s

Related Documents:

Interactive Video Object Segmentation Using Global and Local ... - ECVA

Keywords: Video object segmentation, interactive segmentation, deep learning 1 Introduction Video object segmentation (VOS) aims at separating objects of interest from the background in a video sequence. It is an essential technique to facilitate many vision tasks, including action recognition, video retrieval, video summarization, and video .

7 Views

1y ago

CLASS AGNOSTIC OBJECT SEGMENTATION with FEW-SHOT WEAKLY ... - GitHub Pages

D eveloping countries suffer from limited computational resources and labelled . We akly super vised one shot segmentation. In Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 0-0, 2019 Few-Shot Segmentation with Image-level Labels Can use publicly available web data. . AMP (Siam et. al. 2019) P 43.4 62.2 .

8 Views

1y ago

WESTERN STORYBOARD

2 WESTERN STORYBOARD Shot No: 5 Camera Angle: Eye Level Shot Type: Medium Long Shot Camera Movement: Still Video: Bitter Ben walks over to tree where Marilyn is tied. Audio: Western music continues. Shot No: 6 Camera Angle: Eye Level Shot Type: Mid Shot Camera Movement: Still Video: Bitter Ben drinks from can but it is empty. Audio: Western music. Sounds of sipping from can.

24 Views

2y ago

idex n - No Starch Press

Object built-in type, 9 Object constructor, 32 Object.create() method, 70 Object.defineProperties() method, 43–44 Object.defineProperty() method, 39–41, 52 Object.freeze() method, 47, 61 Object.getOwnPropertyDescriptor() method, 44 Object.getPrototypeOf() method, 55 Object.isExtensible() method, 45, 46 Object.isFrozen() method, 47 Object.isSealed() method, 46

93 Views

3y ago

Digital Transformation of Process and Functional Safety

Object Class: Independent Protection Layer Object: Safety Instrumented Function SIF-101 Compressor S/D Object: SIF-129 Tower feed S/D Event Data Diagnostics Bypasses Failures Incidences Activations Object Oriented - Functional Safety Object: PSV-134 Tower Object: LT-101 Object Class: Device Object: XS-145 Object: XV-137 Object: PSV-134 Object .

26 Views

1y ago

Internal Segmentation Firewall - Manx Technology Group

Internal Segmentation Firewall Segmentation is not new, but effective segmentation has not been practical. In the past, performance, price, and effort were all gating factors for implementing a good segmentation strategy. But this has not changed the desire for deeper and more prolific segmentation in the enterprise.

30 Views

1y ago

Internal Segmentation Firewall

35 Views

1y ago

Getting the future right – Artificial intelligence and ...

Artificial Intelligence – A European approach to excellence and trust. It outlines the main principles of a future EU regulatory framework for AI in Europe. The White Paper notes that it is vital that such a framework is grounded in the EU’s fundamental values, including respect for human rights – Article 2 of the Treaty on European Union (TEU). This report supports that goal by .

48 Views

3y ago

Recent Views

Stock Market Development and Economic Growth: Empirical Evidence from China

measures used to proxy for stock market size and the size of real economy. Most of the existing studies use stock market index as a proxy for measuring the growth and development of stock market in a country. We argue that stock market index may not be a good measure of stock market size when looking at its association with economic growth.

1y ago

274 Views

Lasso Technique Application In Stock Market Modelling: An Empirical .

This research tries to see the influence of G7 and ASEAN-4 stock market on Indonesian stock market by using LASSO model. Stock market estimation method had been conducted such as Stock Market Forecasting Using LASSO Linear Regression Model (Roy et al., 2015) and Mali et al., (2017) on Open Price Prediction of Stock Market Using Regression Analysis.

3m ago

25 Views

The Stock Market Profits Blueprint - Liberated Stock Trader

The stock market profits blueprint has been hand crafted to enable you to understand all the factors that play on the stock market. It is called a blueprint because a blueprint is in effect an architectural document to show how something is designed. The Blueprint will show you a powerful way to envisage how the stock market and the stock market

1y ago

189 Views

Factors Affecting Performance of Stock Market: Evidence from . - HRMARS

We used the data of Colombo Stock Exchange (CSE) for Sri Lankan stock market in this research which is the main stock exchange of Sri Lanka. The market capitalization of CSE is over 20 billion USD. Colombo stock exchange is the first south Asian region stock market and overall 52nd who obtain the membership of World Federation of Exchanges.

11m ago

110 Views

Stock Market Development in the Philippines: Past and Present

Philippine stock market. This paper may serve as a basis for further research on the stock market development in the country. This paper is organized as follows: Section 2 traces the origins of the stock market in the Philippines while section 3 outlines the reforms that have been implemented to strengthen the stock market.

1y ago

136 Views

Columbus,Ohio 1890

Slicing Steaks 3563 Beef Tender, Select In Stock 3852 Angus XT Shoulder Clod, Choice In Stock 3853 Angus XT Chuck Roll, Choice 20/up In Stock 3856 Angus XT Peeled Knuckle In Stock 3857 Angus XT Inside Rounds In Stock 3858 Angus XT Flats, Choice In Stock 3859 Angus XT Eye Of Round, Choice In Stock 3507 Point Off Bnls Beef Brisket, Choice In Stock

2y ago

274 Views

Buying Your First Stock - Stock-Trak

Stock Market Game Time: 15 Minutes Requires: StockTrak Curriculum , Computer Access Buying Your First Stock This lesson is an introduction to buying a stock. Students will be introduced to basic vocabulary that is involved with a buying and owning a stock. Stu-dents will be going through the entire process of buying a stock from looking

1y ago

175 Views

1.11.1. Where to Find Wall Street Training - Investing 101

investing and day trading, how to trade stock options, online free stock trading, market timing strategies, and mutual funds. But, first—learn what these terms mean. Play stock market games:Play stock market games: A stock simulation market game will train you to be comfortable with investing

2y ago

132 Views

Stock Price Prediction Using RNN and LSTM - JETIR

1. BASIC INTRODUCTION OF STOCK MARKET A stock market is a public market for trading of company stocks. Stock market prediction is the task to find the future price of a company stock. The price of a share depends on the number of people who want to buy or sell it. If there are more buyers, then prices will rise. If the seller has a number of .

1y ago

120 Views

Stock Market Wealth Effects - Harvard University

negative stock return and a subsequent decline in household spending and employment. We use a local labor market analysis to address this empirical challenge and provide quantitative evidence on the stock market consumption wealth e ect. Our empirical strategy combines regional heterogeneity in stock market wealth with aggregate movements in stock

1y ago

110 Views

Artificial Intelligence Approach for Stock Market - IJSER

The forecast of stock market helps investors to make investment decisions, via giving them strong insights about the behavior of stock market for avoiding investment risks. It was found that news has an influence on the stock price behavior [2]. The stock market is a constantly changing indicator of economic activity all over the world.

1y ago

114 Views

The Stock Market Game Student Activity Packet - Maryland Council on .

1. The Stock Market Game Kick Off! (3 mins) 2. Intro to Investing (4 mins) 3. Intro to Companies (3 mins) 4. Intro to Stocks (4 mins) 5. Building Your Portfolio (5 mins) 6. The Stock Market Game Trading Portfolio (6 mins) 7. The Stock Market Game Rules (6 mins) 8. Conducting Research (5 mins) 9. Entering Stock Trades (4 mins) 10. Assessing Risk .

1y ago

121 Views

Stock Market Uncertainty and the Stock-Bond Return Relation

implied volatility and stock turnover may prove useful for ﬁnancial applications that need to under-stand and predict stock and bond return co-movements. Finally, our empirical results suggest that the beneﬁts of stock-bond diversiﬁcation increase during periods of high stock market uncertainty. This study is organized as follow.

1y ago

165 Views

The Stock Market Crash of 1929, Great Depression, Dust .

The Stock Market Crash of 1929 In 1929, the Stock Market Crashed!! The stock of a business represents the original money paid into or invested in the business by its founders. So the stock represents how much mone

2y ago

365 Views

Web Based Stock Forecasters - Winlab

Stock market prediction is the act of trying to determine the future value of a company stock or other financial instrument traded on a financial exchange. The successful prediction of a stock's future price could yield significant profit. The stock market is not an efficient market.

1y ago

108 Views

One-Shot Video Object Segmentation With Iterative Online .

It looks like you're using an ad-blocker