Research Article R-CNN-Based Satellite Components .

2y ago
22 Views
2 Downloads
1.39 MB
10 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Grady Mosby
Transcription

HindawiInternational Journal of Aerospace EngineeringVolume 2020, Article ID 8816187, 10 pageshttps://doi.org/10.1155/2020/8816187Research ArticleR-CNN-Based Satellite Components Detection in Optical ImagesYulang Chen,1 Jingmin Gao ,1 and Kebei Zhang212School of Automation, Beijing Information Science & Technology University, Beijing 100192, ChinaBeijing Institute of Control Engineering, Beijing 100190, ChinaCorrespondence should be addressed to Jingmin Gao; gaojm biti@163.comReceived 12 March 2020; Revised 24 July 2020; Accepted 17 September 2020; Published 5 October 2020Academic Editor: Jeremy StraubCopyright 2020 Yulang Chen et al. This is an open access article distributed under the Creative Commons Attribution License,which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.The accurate detection of satellite components based on optical images can provide data support for aerospace missions such aspointing and tracking between satellites. However, the traditional target detection method is inefficient when performingcalculations and has a low detection precision, especially when the attitude of the satellite and illumination conditionschange considerably. To enable the precise detection of satellite components, we analyse the imaging characteristics of asatellite in space and propose a method to detect the satellite components. This approach is based on a regional-basedconvolutional neural network (R-CNN), and it can enable the accurate detection of various satellite components by usingoptical images. First, on the basis of the Mask R-CNN, we combine the DenseNet, ResNet, and FPN to construct a newfeature extraction structure and obtain the R-CNN based satellite-component-detection model (RSD). The feature maps areextracted and concatenated at a deeper multiscale level, and the feature propagation between each layer is enhanced byproviding a dense connection. Next, an information-rich satellite dataset is constructed, which is composed of images ofvarious kinds of satellites from various perspectives and orbital positions. The detection model is trained and optimized onthe constructed dataset to obtain the satellite component detection model. Finally, the proposed RSD model and originalMask R-CNN are tested on the same established test set. The experimental results show that the proposed detection modelhas higher precision, recall rate, and F1 score. Therefore, the proposed approach can effectively detect satellite components,based on optical images.1. IntroductionWith the rapid development of space technology, accomplishing many space tasks, such as autonomous rendezvousand docking in space and space target capture, requires a satellite to accurately identify the main body or components ofthe target satellite to obtain the target position and attitudeinformation [1–5]. Detecting the components of the targetsatellite belongs to the field of target detection, whose goalis to accurately detect the location and type of satellite components, such as solar wings, antenna, and docking devices.Accomplishing this goal is a key problem in the field of computer vision, and it can be solved by considering the similarity of the object features such as background, texture, andshape. However, the task remains challenging due to the differences between the target individuals [6–8].The methods to detect satellite components, which weredeveloped before the development of deep learning, can bedivided into image matching and traditional target detectionmethods. Mingdong et al. [9] used the image matching algorithms to detect space objects. Zhi et al. [10] first preprocessed and segmented the images and later extracted thefeatures by using Surf. Finally, the fractal clustering modelof the satellite components was used to perform the component classification. Cai et al. [4, 11, 12] adopted the traditional target detection method to detect the triangle bracketof a solar wing and proposed different improvements in thefeature extraction stage. In the traditional target detectionapproach, each step is optimised independently, and theglobal optimisation of the whole method cannot be performed. Furthermore, the computational efficiency of thisapproach is low [6].After the successful application of the deep convolution neural network (DCNN) in image classification, thetarget detection task entered a period of rapid development [13–15]. When a DCNN is used for target detection

2International Journal of Aerospace blockFeatureextractionFeaturemapFixed sizefeature mapClass predictionBounding boxpredictionDetected resultMaskpredictionFCFCFCNFigure 1: Network structure of RSD.and target recognition, it exhibits a high robustness to theinterference of the environment and satisfactory generalisation ability, and it can realise target detection with ahigh accuracy [6–8, 14, 15]. At present, the detectionmodel is mainly divided into a single-stage model and atwo-stage model, such as YOLO, SSD, and Mask R-CNN[16–21]. Zeng and Xia [22] proposed a space target recognition method based on the DCNN and [23] proposed akind of target feature point extraction method based ondeep learning to realise the detection and location determination of a docking node.The abovementioned detection methods based on theCNN have two shortcomings: (1) The methods can onlydetect the classification and location of target satellite, butcannot accurately detect the position and edge contour ofmultiple components of the target satellite at the pixel level.(2) In terms of dataset construction, the abovementionedstudies did not systematically sample the satellite; therefore,the information of the dataset was not sufficient. To addressthese two problems, this work makes the following contributions. (1) We improved the feature extraction structure of theMask R-CNN and proposed an R-CNN based method (RSD)to detect satellite components. This detection model can realise the detection of multiple components of the satellite inpixel level, and the object can be segmented into pixelinstances to achieve a higher detection precision. (2) Forthe dataset construction, we first establish a satellite imagedataset, which contains images of 92 kinds of satellites frommultiple perspectives and multiple orbital positions. Thesamples in the dataset can fully represent the appearancecharacteristics of each satellite and reduce the differencebetween the dataset and real scene images, which can provideeffective sample support for the training of the model.The remaining paper is organised as follows. Section 2provides a general overview of the RSD and introduces theconstruction methods of the satellite dataset. The experimentdetails and test results of the RSD are presented in Section 3,and Section 3 analyses and discusses the test results. Finally,the conclusions are presented in Section 4.2. Detection of Satellite ComponentsDuring the movement of a satellite, the screening and brightness of the components change constantly, which is not conducive to realise accurate detection. In addition, the numberof samples in the established dataset is small. Consideringthese factors, to achieve a higher precision for the detectionof satellite components, this paper proposes an R-CNNbased model to detect satellite components. Our RSD modelis an improved version of the Mask R-CNN [21]. This papercombines the network architecture of DenseNet and ResNetwith the idea of the FPN [25] and applies it to the backboneof our improved Mask R-CNN. The prediction heads consistof three branches, which are used for classification prediction, regression box prediction, and generation mask.Figure 1 shows the overall process of the RSD algorithm.The steps in the RSD to detect the satellite componentsare as follows:Step 1. Input the image to be detected.Step 2. The initial feature extraction of the image is performed using the ResNet-FPN. Further feature extraction islater performed by using the dense block. The feature mapsof each scale are upsampled and concatenated, and theconcatenated feature maps are input into the dense blockfor further feature extraction. Finally, the system outputsthe corresponding feature maps.Step 3. Input the feature map into the RPN network structureand generate several filtered accurate ROIs through the proposal layer.Step 4. These ROIs were processed using the ROI align tomatch the pixels in the original image with those in the feature map and extract the corresponding target features inthe shared feature map.

International Journal of Aerospace Engineering3Step 5. These ROIs are input into the FC and FCN for targetclassification and instance segmentation, respectively.Finally, the classification results, regression box, and segmentation mask are generated. The classification results and position information of the satellite components can be obtained.2.1. Feature Extraction Structure. In the DCNN, as the depthincreases, the feature propagation between each layerdegrades, resulting in the loss of information in the transmission process. In addition, after the feature map is upsampledand multiscale concatenation is performed, the semanticinformation may be obscured, and a large number of parameters may be introduced. To overcome these problems, in thisstudy, the idea of DenseNet was applied to the ResNet-FPN,and the features extracted from the ResNet-FPN were furtherprocessed by using a densely connected convolutionalstructure.Figure 2 shows the feature extraction structure of theRSD, which is mainly composed of two parts, namely, theResNet-FPN and densely connected convolution block (DB).2.1.1. ResNet-FPN. Deepening the neural network canimprove the generalisation performance of the model [27];however, increasing the number of layers may lead to problems such as gradient disappearance or gradient explosion,which makes it difficult to train the deep neural network.The ResNet structure can effectively solve the above problems [14, 26]. In the task of satellite component detection,multiscale detection is extremely critical, especially for smallobjects, such as small parts of the satellite. However, at suchlarge distances, the antenna and other components accountfor only 1/2000 of the total area of the image, and thus theyare often difficult to detect. Therefore, we adopt the FPNstructure and ResNet-50 as the backbone. This structurecan fuse the features of all the levels; thus, this structure hasboth a strong semantic information and strong spatial information, which can improve the precision and speed of detection of small objects at multiple scales.As shown in Figure 2, the ResNet-FPN consists of threeparts: the bottom-up connection, top-down connection,and horizontal connection. The bottom-up connection pertains to the process of feature extraction with ResNet as thebackbone, the top-down connection pertains to the processof upsampling from the top layer, and the transverse connection pertains to the fusion of the upsampling feature map andthe feature map of the same size generated from the bottomup process.The image is input into the ResNet-FPN, assuming thatthe size of the input image is 512 512, and the number ofchannels is 64. After extracting the structural features of theResNet-FPN, the feature maps M2, M3, M4, M5, and M6are output. The sizes of these maps are 64 64, 32 32, 16 16, 8 8, and 4 4, respectively, and the number of channels is 256. Next, M2, M3, , and M6 are input into the subsequent densely connected convolution block to furtherextract the features.2.1.2. Densely Connected CNN. A deep neural network canautonomously learn the characteristics of data through aResNet-FPNTop downBottom upM6Down ure 2: Feature extraction structure of the RSD.large number of sample data. However, when the amountof sample data is limited, the trained model usually has aninferior generalisation ability, and the traditional CNN alsohas problems such as gradient disappearance, large numberof parameters, and parameter redundancy [14, 24]. To solvethese problems, we combine the idea of the dense connectionwith the feature extraction structure of the ResNet-FPN, deepfeature extraction is performed for the satellite components,which enhances the feature propagation between the layersand alleviates the obscurity of the semantic caused by thesampling up of the feature maps and multiscale featuremap fusion.Figure 3 shows the densely connected convolution block(dense block) in the feature extraction structure used in thispaper, which consists of five layers. The first layer includesonly the convolution layer, and the other layers all containa batch normalisation layer (BN), modified linear activationlayer (ReLU), and convolutional layer (CONV).In the dense block, the input of each layer is related notonly to the output of the previous layer but also to the outputof all the previous layers, which serves as the input. Thisstructure can make full use of all the feature informationincluded in the previous layer, considerably reduce the connection distance between the front and back layers, and effectively solve the problem of gradient disappearance with thedeepening of the network [14].The generation formula for the feature graph of layer i isas follows:X i H i ð½X 0 , X 1 , X 2 , , X i 1 Þ:ð1ÞHere, ½X 0 , X 1 , X 2 , , X i‐1 represents the concatenationof the feature graph generated in layer 0, 1, , i 1 as thedimension of the channel. H i is a composite function corresponding to the batch normalisation (BN), modified linearelement (ReLU), and convolution (Conv). Assuming thatthe number of feature graphs transmitted by each nonlineartransformation H is K, and the number of feature graphs atlayer 0 is K 0 , the number of input feature graphs at layer iis K 0 ði‐1Þ K; K is also known as the growth rate.The specific structure of the Dense Block is described inTable 1. The convolution layer DB Conv1 does not containthe BN and ReLU layers. This layer is set as such to reduce

4International Journal of Aerospace EngineeringIllustration:Conv BN ReLu DB convi,i emap PFeaturemap MFigure 3: Dense block of the improved Mask R-CNN.Table 1: Dense block architectures.LayerInputKernel StrideOutput sizeDB Conv1128 128, 2563 31128 128, 128DB Conv2128 128, 1283 31128 128, 32Concat 1128 128, 128128 128, 32DB Conv3128 128, 160Concat 2DB Conv4Concat 3DB Conv5Concat 4128 128, 1603 3128 128, 32128 128, 128ð128 128, 32Þ 2128 128, 192128 128, 1923 31128 128, 128ð128 128, 32Þ 3128 128, 224128 128, 128ð128 128, 32Þ 4128 128, 32128 128, 2243 31128 128, 32128 128, 256the number of channels to avoid the subsequent featureextraction process, which incurs a large computation costand several parameters. Assuming that the size of the inputfeature graph M is 128 128 and the number of channels is256, the first convolution layer is used to reduce the numberof channels to 128, and the features are later extracted andfused through the following four layers. Finally, the outputfeature graph P is output with the number of channels being256, and this graph is input to the subsequent RPN and prediction head to realise the target detection.2.2. Dataset Construction. Because of the limited number ofsatellite images available for the real scene and uneven distribution of the visual angle, the model cannot satisfy the needsof model training and learning and exhibits an inferior performance. Therefore, in this paper, the images of satelliteunder various perspectives and orbital positions were collected using the software System Tool Kits (STK), which isan analytical software developed by American AnalyticalGraphics in the aerospace domain, and these images servedas the basis of the dataset.The overall process of constructing the dataset of the satellite components is shown in Figure 4. First, the images ofthe satellite are collected to establish a dataset containing richinformation of the satellite. Second, the components of thesatellite in the image are labelled. As an example, the antennaand solar wing were taken (denoted as components I and II,respectively) as the target components to be detected. Theconstructed dataset contained 1288 samples. The datasetwas randomly divided into a training set and test set inproportion.By using the reasonable multiangle and multiorbit position sampling strategy, a large number of satellite imagescan be collected, thereby providing more systematic materials for the establishment of the subsequent datasets. Asshown in Figure 5, to make the perspective distribution ofthe image in the dataset more uniform and reasonable, thispaper sampled the appearance of the satellite from the following 14 perspectives.Using the above method, the appearance of the satellitecan be sampled uniformly from 14 perspectives. Under thesampling of this uniform perspective, the satellite will havecomponent occlusion and overlap, which fully simulates thepossible situation of the real scene.To ensure that the satellite data set contains more effective information and to better overcome the differences ofthe simulated images and real scene images, in the image collection, we take the position of the satellite as one of the factors to be considered and adjust the relative position of thesatellite and the sun. As shown in Figure 6, the satellite issampled at two orbital positions to induce changes in thelight intensity and imaging effect to better simulate the realscene.Figure 7 is the comparison between the simulated image(a) and the real image (b) of the ISS (International Space Station). In terms of the outline, shape, and texture of components, taking the solar wing as an example, the solar wingof the two images is almost consistent. In terms of color,the color of each component in the two images is very similar, except for the difference in brightness. Taking orbit positions into consideration can make the information of thedataset more reasonable and effective and considerably

International Journal of Aerospace tationSatellitedataset3D modelsof satelliteTrainingsetComponentsof satellitedatasetTest setFigure 4: Overall process of dataset construction for the target components of igure 5: 14 sampling perspectives. (a) 6 sampling perspectives are 1, 2, , and 6. (b) 8 sampling perspectives are 7, 8, , and 14. The 3dcoordinate was taken as the datum, and origin o was the centre of the satellite. The 3d coordinate was divided into eight octagons. Theoctagon determined by the positive half axis of the x, y, and z axes was the first octagon. The 11th to 14th octant was below the xoy plane,and the octant was marked counterclockwise.i, p i represents the ground-truth label probability of anchor i,t i represents the difference between the predicted regressionbox and ground-truth label box, and t i represents the difference between the ground-truth label box and positive anchor.LMul‐Branch is calculated using formula (3)Position iiEarthSunLMul‐Branch Lcls Lbox Lmask :Position iFigure 6: Multiposition of orbit sampling. When the satellite is inposition I, the satellite’s image is clear and the contrast is high.However, when the satellite is in position II, the satellite’s image isnot clear and the contrast is low.reduce the differences between the simulation images andreal images, which provides a concrete foundation for themodel training, Figure 8 shows several samples in the finalbuilt dataset.2.3. Loss Function. The loss function L of our model is calculated using formula (2), which, similar to that for the Mask RCNN, consists of two parts [20]: the loss function LRPN fortraining the RPN, and the loss function LMul‐Branch for training the multitask branches:L LRPN LMul‐Branch :ð2ÞLRPN is calculated using formula (2), which includes theloss function of the anchor category Lcls . The loss functionof the regression box Lreg is the softmax cross-entropy loss,and Lreg is the smooth L1 loss.LRPN 11 Lcls ðpi , p i Þ λ p L ðt , t Þ:N cls iN reg i i reg i ið3ÞHere, pi represents the classification probability of anchorð4ÞHere, the loss function of classification Lcls is the crossentropy loss; the loss function of the regression box Lbox isthe smooth L1 loss. The loss function of mask Lmask is theaverage binary cross-entropy loss.2.4. Model Training. All parts of the RSD are jointly trained.The main steps of the training are as follows: (1) First, theestablished dataset of the satellite components is divided intoa training set and test set, with proportions of 80% and 20%(numbers of 1033 and 255), respectively; (2) the superparameters of the model are set; (3) the weights in the RSDare initialised; (4) the training samples and labels are fed intothe model, and the loss function is calculated and back propagation is performed; (5) the model is trained until the valueof the loss function Ltrain remains stable, and the learning rateis adjusted to continue training for a period of time. Trainingis repeated until Ltrain does not exhibit a significant decline.The experimental parameters of the RSD model andMask R-CNN are set as follows: The SGD optimiser isadopted in the optimisation method, the momentum is setas 0.9, the weight decay is set as 0.0001, and the nonmaximum suppression (NMS) threshold is set as 0.7. This papersets the size of the anchor to 4 4, 12 12, 16 16, 32 32,and 56 56, and the ratio of the anchor is set as 1 : 1, 2 : 1,and 1 : 2 to better fit the ground truth of the target. Settingthe anchor to have multiple sizes can help better detect thecomponents of multiple sizes.

6International Journal of Aerospace Engineering(a)(b)Figure 7: Comparison between simulated image and real image.LossFigure 8: Several samples in the built dataset.Table 2: Class and number of target components.3.532.521.510.50ClassComponent ΙComponent II050100150200250300Iteration/200Figure 9: Change curve of the loss function during training.The initialisation of the model weight is performed in twophases: The first phase involves the initialisation of theResNet-FPN and the prediction head, and the second phaseinvolves the initialisation of the dense block. The first phaseuses the pretrained model on the MS COCO dataset for theweight initialisation, and the second part performs the uniform distribution initialisation with a lower and upperboundary of -0.05 and 0.05, respectively. As shown inFigure 9, the initial learning rate is 0.001, and Ltrain is approximately 0.15 after 40 K iterations. The learning rate isadjusted to 0.001/10, and the number of iterations is 17 K.Number518197At this instant, Ltrain is approximately 0.14, Ltrain remainsnearly constant, and the training is stopped. The completedtraining requires approximately 13 h, and the optimised RCNN based satellite component detection model is obtained.3. Experimental Results and DiscussionAll the experiments (training and testing of the model) in thispaper were conducted on the same server under the deeplearning development framework of TensorFlow and Keras,with the PC configuration as follows: Inter Xeon e5-2620 v42.10 GHz 32 CPU and RTX 2020Ti GPU.3.1. Evaluation Index. In this paper, the precision, recall, andF1 score were used to evaluate the model performance. Precision describes the ability of a classification model to returnonly relevant objects. The recall describes the ability of a

International Journal of Aerospace Engineering7Table 3: Confusion matrix.Component ΙPrediction classComponent IIBackground513325016919525/Component ΙComponent IIBackgroundGround truthTable 4: Evaluation of the algorithm performance.MethodRSDMask RCNNPrecision for componentΙRecall for component Precision for Recall for componentΙcomponent 930.930.910.980.890.780.90.88classification model to identify all the relevant targets. The F1score is the harmonic average of the precision and recall rate,and a higher value corresponds to a better detection performance of the model.The precision is calculated using formula (5):P TP,TP FPAttitude Ið5Þwhere TP represents the number of samples that are positiveand tested positive in the test set, while FP represents thenumber of samples that are negative but tested positive.The recall is calculated using formula (6):R TP,TP FNð6Þwhere FN represents the number of samples that are positivebut tested negative.The F1 score is calculated using formula (7):F1 2P R,P RComponent ið7Þwhere P and R denote the precision and recall, respectively.3.2. Test and Results. The RSD and the original Mask R-CNNmodel were used to detect 255 test samples (20% of the dataset). The number of components in the test set is shown inTable 2.The confusion matrix of the detection results obtainedusing the proposed method is shown in Table 3.The precision and recall rate can be obtained using theconfusion matrix. As shown in Table 4, the overall precisionand recall rate of the improved Mask R-CNN are higher thanthose of the Mask R-CNN network. The overall precision,recall, and F1 score of the proposed method are all 0.93,respectively. Compared with the Mask R-CNN, the proposedmodel exhibits a precision improved by 3% and F1 scoreimproved by 4%.Component iAttitude IIFigure 10: Deformation of component II. In attitude I, component Iis round and the bounding rectangle is a square; however, in attitudeII, component I appears as an oval.3.3. Discussion. The detection results of the Mask R-CNN forthe satellite components and background involved severalerrors. In contrast, the feature extraction structure of the proposed method could integrate the features of various levels,ensuring that they have a strong semantic information andstrong spatial information simultaneously to provide aneffective feature map for the subsequent detection and betterdistinguish the target components and background. Theclose-up graph of the test results was considered to discussthe improvement in the precision and recall.Compared with the component I and the main body ofthe satellite, the area of the component II is small. In addition, because the satellite is in a state of constant motion ina variety of attitudes, the shape of the components changes.Figure 10 shows the imaging of the same satellite in differentattitudes. Under ideal conditions, this component I is circular, and in most cases, it is elliptical. The deformation of thetarget caused by the change in the attitude directly affectsthe recall. Figure 11(a) shows the detection result pertainingto attitude II, as shown in Figure 10, obtained using the proposed method and Mask R-CNN.Figure 11 shows the comparisons on 2 samples, where itcan be found that the proposed RSD can accurately detectall target components. Mask R-CNN existed a missdetection of the target component and incorrectly detectedthe background as one of the targets. The RSD is relativelybetter than the Mask R-CNN in terms of the precision andrecall rate; it can not only identify more target componentsbut also reduce the false identification and classification ofthe components and background. Under the condition that

8International Journal of Aerospace Engineering(a)(b)Figure 11: Comparisons between the proposed one and Mask R-CNN. (a, b) are the detection results of the two samples, where the left side ofeach sample is the detection result of RSD, and the right side is the detection result of Mask R-CNN. The bottom row is a close-up of the testresults, with the yellow box representing the incorrectly detected part.the component does not undergo severe deformation, theRSD can detect the target component well and classify itcorrectly.4. ConclusionThis paper proposes a satellite component detection methodbased on the region-based convolutional network and estab-lishes a satellite dataset. The results of the performed contrastexperiment proved that the proposed RSD model exhibited abetter performance than that of the Mask R-CNN. The specific contributions were as follows: (1) The satellite datasetconstructed in this paper contained abundant satellite information. A total of 92 kinds of satellites were sampled uniformly from 14 angles and 2 orbital positions to ensure thatthe dataset could fully simulate the imaging of a satellite in

International Journal of Aerospace Engineeringa variety of attitude and illumination brightness conditions.(2) In this paper, the Dense Net and ResNet-FPN were combined to improve the feature extraction structure. The imagewas first extracted through the ResNet-FPN and later deeplyextracted through the dense block to enhance the featuretransmission between each layer. The experiments indicatedthat the proposed model exhibited a better performance thanthat of the Mask R-CNN. However, the performance of ourRSD in detecting severely deformed components is still notgood, and further research is still needed in the future work.Data AvailabilityThe data used to support the findings of this study are available from the corresponding author upon request. And in thefuture, the data used to support the findings of this study willbe published online.Conflicts of InterestThe authors declare that there is no conflict of interestregarding the publication of this paper.AcknowledgmentsThe authors thank STK developers for providing the dataused in the paper. This research was carried out by the Beijing Information Science & Technology University, Beijing,China, under the following project, Stable Support Projectof State Administration of Science, Technology and Industryfor National Defence (HTKJ2019KL502008), Beijing Municipal Education Commissions capacity building for scienceand technology innovation services-basic research businessfees (scientific research) (

of satellite components, this paper proposes an R-CNN-based model to detect satellite components. Our RSD model is an improved version of the Mask R-CNN [21]. This paper combines the network architecture of DenseNet and ResNet with the idea of the FPN [25] and applies it to the backbone

Related Documents:

Amendments to the Louisiana Constitution of 1974 Article I Article II Article III Article IV Article V Article VI Article VII Article VIII Article IX Article X Article XI Article XII Article XIII Article XIV Article I: Declaration of Rights Election Ballot # Author Bill/Act # Amendment Sec. Votes for % For Votes Against %

Fast R-CNN a. Architecture b. Results & Future Work Agenda 42. Fast R-CNN Fast test-time, like SPP-net One network, trained in one stage Higher mean average precision than slow R-CNN and SPP-net 43. Adapted from Fast R-CNN [R. Girshick (2015)] 44.

CNN R-CNN: Regions with CNN features Figure 1: Object detection system overview. Our system (1) takes an input image, (2) extracts around 2000 bottom-up region proposals, (3) computes features for each proposal using a large convolutional neural network (CNN), and then (4) classifies each region using class-specific linear SVMs. R-CNN .

Fast R-CNN [2] enables end-to-end detector training on shared convolutional features and shows compelling accuracy and speed. 3 FASTER R-CNN Our object detection system, called Faster R-CNN, is composed of two modules. The first module is a deep fully convolutional network that proposes regions, and the second module is the Fast R-CNN detector [2]

Zhang et al. [35] used CNN to regress both the density map and the global count. It laid the foundation for subsequent works based on CNN methods. To improve performance, some methods aimed at improving network structures. MCNN [36] and Switch-CNN [2] adopted multi-column CNN structures for mapping an im-age to its density map.

High Brow Hickory Smart Little Kitty Smart Lil Highbrow Lena Joe Peppy 1992 Consigned by CNN Quarter Horses CNN Highbrow Lady 2006 Bay Mare CNN Highbrow Lady 4902100 NOTES: CNN Highbrow Lady is a smart, fancy, Highbrow filly out of a powerful female line. She is well broke.

fast-rcnn. 2. Fast R-CNN architecture and training Fig. 1 illustrates the Fast R-CNN architecture. A Fast R-CNN network takes as input an entire image and a set of object proposals. The network first processes the whole image with several convolutional (conv) and max pooling

Jia-Bin Huang, Virginia Tech. Today's class Overview Convolutional Neural Network (CNN) Understanding and Visualizing CNN Training CNN. Image Categorization: Training phase Training . CNN as a Similarity Measure for Matching FaceNet [Schroff et al. 2015] Stereo matching [Zbontar and LeCun CVPR 2015]