CoCon: Cooperative-Contrastive Learning

2y ago

11 Views

2 Downloads

4.22 MB

10 Pages

Last View : 1d ago

Last Download : 3m ago

Upload by : Camille Dion

Report this link

Download PDF

Transcription

CoCon: Cooperative-Contrastive LearningNishant Rai1 , Ehsan Adeli1 , Kuan-Hui Lee2 , Adrien Gaidon2 , Juan Carlos Niebles11Stanford University 2 Toyota Research InstituteFlowEmbeddingsFlowKeypointEmbeddingsRGB EmbeddingsRGBKeypointFlowRGBKeypointFigure 1: Given a pair of instances (e.g. people doing squats) and corresponding multiple views, features are computed using viewspecific deep encoders f ’s. Different instances may have contrasting similarities in different views. For instance, V0 (left) and V1 (right)have similar optical-flow o ff low and pose keypoints (keypoint) p fkeypoint features but their image i frgb features are far apart.CoCon leverages these inconsistencies by encouraging the distances in all views to become similar. High similarity of o0 , o1 and p0 , p1nudges i0 , i1 towards each other in the RGB space.AbstractLabeling videos at scale is impractical. Consequently,self-supervised visual representation learning is key for efficient video analysis. Recent success in learning imagerepresentations suggest contrastive learning is a promisingframework to tackle this challenge. However, when appliedto real-world videos, contrastive learning may unknowinglylead to separation of instances that contain semanticallysimilar events. In our work, we introduce a cooperativevariant of contrastive learning to utilize complementary information across views and address this issue. We use datadriven sampling to leverage implicit relationships betweenmultiple input video views, whether observed (e.g. RGB)or inferred (e.g. flow, segmentation masks, poses). We areone of the firsts to explore exploiting inter-instance relationships to drive learning. We experimentally evaluate our representations on the downstream task of action recognition.Our method achieves competitive performance on standardbenchmarks (UCF101, HMDB51, Kinetics400). Furthermore, qualitative experiments illustrate that our models cancapture higher-order class relationships. The code is available at http://github.com/nishantrai18/CoCon.1. IntroductionThere has recently been a surge in interest for approachesutilizing self-supervised methods for visual representationlearning. Recent advances in visual representation learning have demonstrated impressive performance comparedto their supervised counterparts [3, 14]. Fresh developmentin the video domain have attempted to make similar improvements [10, 16, 25, 35].Videos are a rich source for self-supervision, due to theinherent temporal consistency in neighboring frames. Anatural approach to exploit this temporal structure is pre-

dicting future context as done in [10, 16, 25, 27]. Such approaches perform future prediction in mainly two ways: (1)predicting a reconstruction of future frames [25, 27, 39], (2)predicting features representing the future frames [10, 16].If the goal is learning high-level semantic features for otherdownstream tasks, then complete reconstruction of framesis unnecessary. Inspired by developments in language modelling [29], recent work [41] propose losses that only focuson the latent embedding using frame-level context. One ofthe more recent approaches [10] propose utilizing spatiotemporal context to learn meaningful representations. Eventhough such developments have led to improved performance, the quality of the learned features is still laggingbehind that of their supervised counterparts.Due to the lack of labels in self-supervised settings, itis impossible to make direct associations between differenttraining instances. Instead, prior work has learned associations based on structure, either in the form of temporal[10, 20, 23, 26, 44] or spatial proximity [10, 18, 20, 30]of patches extracted from training images or videos. However, the contrastive losses utilized enforce similarity constraints between instances from same videos while pushinginstances from other videos far away even if they represent the same semantic content. This inherent drawbackforces learning of features with limited semantic knowledge and encourage performing low-level discriminationbetween different videos. Recent approaches suffer fromthis restriction leading to poor representations.The idea of utilizing multiple views of information hasbeen a well-established one with roots in human perception [4, 15]. It’s argued that useful higher order semantics are present throughout different views and are consistent across them. At the same time, different views provide complementary information which can be utilized toaid learning in other views. Multi-view learning has beena popular direction [35, 40] utilizing these traits to improverepresentation quality. Recent approaches learn features utilizing multiple views with the motivation that informationshared across views has valuable semantic meaning. A majority of these approaches directly utilize core ideas suchas contrastive learning [31] and mutual information maximization [2, 24, 46]. Although the fusion of views leads toimproved representations, such approaches also utilize contrastive losses, consequently suffering from the same drawback of low-level discrimination between similar instances.We propose Cooperative Contrastive Learning (CoCon),which overcomes this shortcoming and leads to improvedvisual representations. Our main motivation is that eachview sees a specific pattern, which can be useful to guideother views and improve representations. Our approach utilizes inter-view information to avoid the drawback of discriminating similar instances discussed earlier. To this end,each view sees a different aspect of the videos, allowing it tosuggest potentially similar instances to other views. This allows us to infer implicit relationships between instances ina self-supervised multi-view setting, something which weare the first to explore. These associations are then used inorder to learn better representations for downstream applications such as video classification and action recognition.Fig. 1 shows an overview of CoCon. It is worth notingthat although CoCon utilizes building blocks currently usedin self-supervised representation learning, it is applicableto other tasks utilizing contrastive learning and be used inconjunction with other recently proposed methods.We use ‘freely’ available views of the input such as RGBframes and Optical Flow. We also explore the benefit of using high-level inferred semantics as additional noisy views,such as human pose keypoints and segmentation masks generated using off-the-shelf models [45]. These views are notindependent, as they can be derived from the original input images. However, they are complementary and leadto significant gains, demonstrating CoCon’s effectivenesseven with noisy related views. The extensible nature ofour framework and the ‘freely’ available views used makeit possible to use CoCon with any publicly available videodataset and other contrastive learning approaches.2. Related WorkSelf-supervised Learning from images Recent approaches have tackled image representation learning by exploiting color information [22, 47] and spatial relationships[30, 34], where relative positions between image patches areexploited as supervisory signals. Several approaches applyself-supervision to super-resolution [6, 19] or even to multitask [5] and cross-domain [33] learning frameworks.Self-supervised Learning from videos Multiple approaches [10, 16, 25, 27, 39] perform self-supervisionthrough ‘predicting’ future frames. However, the term ‘predicting’ is overloaded, as they do not directly predict andreconstruct frames but instead operate on latent representations. This ignores stochasticity of frame appearance, e.g.,illumination changes, camera motion, appearance changesdue to reflections and so on, allowing the model to focus onhigher-order semantic features. Recent work [10, 40] utilizeNoise Contrastive Estimation to perform prediction of thelatent representations rather than the exact future frames,vastly improving performance. Yet, another class of proxytasks are based on temporal ordering of frames [28, 44].Temporal coherence [17, 43] and 3D puzzle [20] were usedas proxy loss to exploit spatio/temporal structures.Multi-view learning Multiple views of videos are richsources of information for self-supervised learning [35, 40,42]. Two stream networks for action recognition [37] haveled to many competitive approaches, which demonstrate using even derivable views such as optical flow helps improveperformance considerably. There have been approaches

[26, 35, 40, 42] utilizing diverse views, sometimes derivable from one other, to learn better representations. However, these approaches utilize inter-view links by maximizing mutual information between them. Although this leadsto improved performance, we believe the rich inter-viewlinkages can be utilized more effectively by utilizing themto uncover implicit relationships between instances.Multi-View Self-supervised learning Multiple recentapproaches [1, 11, 12, 32] have tackled the challenge ofmulti-modal self-supervised learning achieving impressiveperformance. However, these approaches suffer from thesame drawback of discriminating between similar instances,leaving potential to benefit from inter-sample relationships.Most approaches above perform self-supervision usingpositive and negative pairs mined through structural constraints, e.g., temporal and spatial proximity. Although thisresults in representations that capture some degree of semantic information, it incorrectly leads to treating similaractions differently due to the inherent nature of their pairmining. For instance, clip pairs in different videos are considered negatives, even if they represent the same action.We argue that utilizing different views and inter-instance relationships to propose positive pairs to aid training can leadto improvement of all views simultaneously.3. MethodWe describe cooperative contrastive learning (CoCon)and intuition behind our designs in this section. Additional details regarding architecture and implementation arepresent in the appendix. In the following sections, we buildour framework borrowing the learning framework presentin [10] which learns video representations through spatiotemporal contrastive losses. It should be noted that eventhough we use this particular self-supervised backbone inour experiments, our approach is not restricted by the choiceof the underlying self-supervised task. CoCon can be usedin conjunction with any other frameworks currently presentand allow them to be extended to a multi-view setting.A video V is a sequence of T frames (not necessarily RGB images) with resolution H W and C channels, {i1 , i2 , . . . , iT }, where it RH W C . AssumeT N K, where N is the number of blocks and Kdenotes the number of frames per block. We partition avideo clip V into N disjoint blocks V {x1 , x2 , . . . , xN },where xj RK H W C and a non-linear encoder f (.)transforms each input block xj into its latent representationzj f (xj ). An aggregation function, g(.) takes a sequence{z1 , z2 , . . . , zj } as input and generates a context representa tion cj g(z1 , z2 , . . . , zj ). In our setup, zj RH W DD and cj R . D represents the embedding size and H , W represent down-sampled resolutions as different regions inzj represent features for different spatial locations. We define zj P ool(zj ) where zj RD and c F (V ) whereF (.) g(f (.)).Similar to [10], we create a prediction task involving predicting z of future blocks. Details are provided in the appendix. For multiple views, we define cv Fv (Vv ), whereVv , cv and Fv represent the input, context feature and composite encoder for view v respectively.Contrastive Loss Noise Contrastive Estimation (NCE) [9,29, 31] constructs a binary classification task where a classifier is fed with real and noisy samples with the trainingobjective being distinguishing them. Similar to [10, 31], weuse an NCE loss over our feature embeddings described inEq 1. zi,k represents the feature embedding for the ith time step and the k th spatial location. Recall zj RH W Dwhich preserves the spatial layout. We normalize zi,k tolie on the unit hypersphere. Eq 1 is a cross-entropy lossdistinguishing one positive pair from all the negative pairspresent in a video. We use temperature τ 0.005 in ourexperiments. In a batch setting with multiple video clips, itis possible to have more inter-clip negative pairs.To extend this to multiple views, we utilize different encoders φv for each view v. We train these encoders byutilizing Lcpc for each of them independently, giving us,Lcpc v LvcpcLcpc i,kexp(z̃i,k · zi,k / τ )log j,m exp(z̃i,k · zj,m / τ ) (1)Cooperative Multi-View Learning Recent approaches[12, 35, 40] tackle multi-view self-supervised learning bymaximizing mutual information across views. They involveusing positive and negative pairs generated using structural constraints, e.g., spatio-temporal proximity in videos[10, 11, 35, 40]. Although such representations capturesemantic content, they unintentionally encourage discriminating video clips containing semantically similar contentdue to the inherent nature of pair generation, i.e. videoclips from different videos are negatives. We utilize interinstance relationships to alleviate some of these issues.We soften this constraint by indirectly deriving pair proposals using different views. Such a co-operative schemebenefits all models as each individual view gradually improves. Better models are able to generate better proposals,improving performance of all views creating a positive feedback loop. Our belief is that significant semantic featuresshould be universal across views, therefore, potential incorrect proposals from one view should cancel out through proposals from other views.We achieve the above by computing view-specific distances and synchronizing them across all views. We enforce a consistency loss between distances from each view.Looking at it from another perspective, we are encouragingrelationships between instances to be the same across viewsi.e. similar pairs in one view should be a similar pair in

63.769.866.071.462.769.267.872.5Table 1: Impact of losses on performance of models when jointly trainedwith RGB and Flow. CoCon i.e. Ltotal (67.8) comfortably improvesperformance over CPC i.e. Lcpc (63.7). Lxy Lx λLy where λ 10.0for this K400UCFRGBHMDB46.768.667.872.1Flow2UCF .2Table 2: Impact of pre-training comparison. CoCon demonstrates a consistent improvement in both RGB and Flow.MethodFigure 2: Examples for each view. From top to bottom - RGB,RGBFlowPoseHMSegMaskUCF HMDB UCF HMDB UCF HMDB UCF HMDBRandom 46.7CPC 63.7CoCon .733.042.042.642.753.755.826.332.834.0Flow, SegMasks and Poses. Note the prevalence of noise ina few samples, specially SegMasks; There are multiple otherinstances where Poses, SegMasks are noisy but have not been Table 3: Impact of co-training on views. CoCon is jointly trained withshown here.four modalities (RGB, Flow, PoseHM, & SegMask).other views as well. Treating this as inter-view graph regularization, we create a graph similarity matrix Wv of sizeK K, using some distance metric. We represent our distance metric by D(.). In our experiments, we use the cosinevdistance which translates to Wab zz · zb .aAssume hv denotes the representation for the v th viewof instance a. In our experiments, we use h z givingus block level features. Our resultant loss becomes the inconsistency between similarity matrices across views. Theresultant graph regularization loss becomes v0 ,v1 W v0 W v1 which is simplified in Eq 2.Building on top of our earlier intuition, in order to havesensible proposals, we need to have discriminative scores,i.e. we should have both positive (D 0) and negative(D 1) pairs. To promote well distributed distances, weutilize the hinge loss described in Eq 3.Lsim is the hinge loss, where the first term pushes representations of the same instance in different views closer;while the second term pushes different instances apart.Since the number of structural negative pairs are muchlarger than the positives, we introduce µ in order to balancethe loss weights. We choose µ such that the first and secondcomponents contribute equally to the loss.Lsync v0 ,v1 a,bD(hav0 , hbv0 ) D(hav1 , hbv1 ) 2(2)Lsim v0 ,v1a µ a bD(hav0 , hav1 ) max 0, 1 D(hav0 , hbv1 )(3)Note that Lsim entangles different views together. An alternative would be defining such a loss individually for eachview. However, diversity is inherently encouraged throughLcpc , and interactions between views have the side-effectof increasing their mutual information (MI), which leads toimproved performance [35, 40].We combine the above losses to get our cooperative loss,Lcoop Lsync α · Lsim . We use α 1.0 for our experiments and observe roughly similar performance for different values of α. The overall loss of our model is given byLcocon Lcpc λ · Lcoop . Lcpc encourages our model tolearn good features for each view, while Lcoop nudges it tolearn higher-level features using all views while respectingthe similarity structure across them.4. ExperimentsThe goal of our framework is to learn video representations which can be leveraged for video analysis tasks.Therefore, we perform experiments validating the quality ofour representations. We measure downstream action classification to objectively measure model effectiveness and ana-

Action ClassCoConCPCPlayCelloPlaySitar, PlayTabla, PlayDholN/ASurfing, SkijetSurfingSkiingHammerThrow BaseballPitch, ThrowDiscus, ShotputN/ABrushTeeth ApplyLipstick, EyeMakeup, ShaveBeard ApplyLipstickTable 4: Nearest consistent semantic classes. Individually trained views (CPC) do not have# Views24RGBFlowUCF HMDB UCF HMDB67.871.037.739.072.574.544.145.4Table 5: Impact of performance on vary-ing views. A consistent improvement can beconsistent neighbors across views, leading to empty results (N/A) for ’PlayingCello’ andseen with more views despite the prevalent’HammerThrow’. While views trained using CoCon show consistency across views, leadingnoise in PoseHM and SegMasks.to sensible relationships e.g. ’HammerThrow’ related to other classes involving throwing.lyze impact of our designs through controlled ablation studies. We also conduct qualitative experiments to gain deeperinsights into our approach. In this section, we briefly goover our experiment framework. Additional details and discussions for each component are provided in the appendix.Datasets Our approach is a self-supervised learningframework for any dataset with multiple views. However, we discuss its relevance to video action classificationin our experiments. We focus on human action datasetsi.e. UCF101, HMDB51 and Kinetics400. UCF101 contains 13K videos spanning over 101 human action classes.HMDB51 contains 7K video clips mostly from movies for51 classes. Kinetics-400 (K400) is a large video datasetwith 306K video clips from 400 classes.Views We utilize different views in our experiments. ForKinetics-400, we learn encoders for RGB and Optical Flow.We use Farneback flow (FF) [7] instead of the commonlyused TVL1-Flow as it is quicker to compute lowering ourcomputation budget. Although FF leads to lower performance compared to TVL1, the essence of our claims remainunaffected. For UCF101 and HMDB51, we learn encodersfor RGB, TVL1 Optical Flow, Pose Heatmaps (PoseHMs)and Human Segmentation Masks (SegMasks). A few visual samples for each view are provided in 2. PoseHMs andSegMasks are generated using an off-the-shelf detector [45]without any form of pre/post-processing.Implementation Details We choose a 3D-ResNet similar to [10, 13] as the encoder f (.). We choose N 8and K 5 in our experiments. We subsample the inputby uniformly choosing one out of every 3 frames. Our predictive task involves predicting the last three blocks usingthe first five blocks. We use standard data augmentationsduring training whose details are provided in the appendix.We train our models using Adam [21] optimizer with an initial learning rate of 10 3 , decreased upon loss plateauing.We use 4 GPUs with a batch size of 16 samples per GPU.Multiple spatio-temporal samples ensure sufficient negativeexamples despite the small batch size used for training.Action Classification We measure the effectiveness ofour learned representations using the downstream task ofaction classification. We follow the standard evaluation protocol of using self-supervised model weights as initializa-tion for supervised learning. The architecture is then finetuned end-to-end using class label supervision. We finallyreport the fine-tuned accuracies on UCF101 and HMDB51.While fine-tuning, we use the learned composite functionF (.) in order to generate context representations for thevideo blocks. The context feature is further passed througha spatial pooling layer followed by a fully-connected layerand a multi-way softmax for action classification.4.1. Quantitative ResultsWe analyze various aspects of CoCon through ablationstudies, experiments on multiple datasets, controlled variation of views and comparison to comparable methods. Weobjectively evaluate model performance using downstreamclassification accuracy as a proxy for learned representation quality. Pre-training is performed on either UCF101or Kinetics400. We propose two baselines for comparison.(1) Random - random initialization of weights (2) CPC self-supervised training utilizing only Lcpc ; which is effectively individual training of views. CPC serves as a criticalbaseline to measure the benefits of multi-view training asopposed to individual training.Ablation Study We have motivated the utility of ourvarious loss components. We now perform experiments toquantify the impact of each. The pre-training dataset usedis the 1st split of UCF101, and downstream classificationaccuracy is computed on the same. Table 1 summarizesthe results of our experiment. As expected, all cross-viewapproaches comfortably perform better than CPC; demonstrating the utility of multi-view training.Using Lcpcsync leads to no performance improvements, asonly using Lsync leads to the model collapsing by squashing all D scores to have similar values, thus necessitatingLsim to counter-balance this tendency. Lcpcsim leads to improved performance wrt Lcpc as it learns better features byeffectively maximizing mutual information between views.CoCon i.e Lcocon achieves the same by also regularizingmanifolds across views, leading to even better performanceacross all views. The important comparison to observe iscpcbetween Lcpcsim and Lcocon . As Lsim is the most similarbaseline to other multi-view approaches, e.g., CMC [40].However, we argue this baseline is even stronger as it in-

volves both single-view and multi-view components compared to [40], which only uses a contrastive multi-view lossto learn representations.Effect of Datasets A critical benefit of self-supervisedapproaches is the ability to run on large unlabelled datasets.To simulate such a setting, we perform pre-training using UCF101 or Kinetics400 1 without labels utilizing the1st splits of UCF101 and HMDB51 for evaluation. Table2 confirms pre-training with a larger dataset leads to better performance. It is also worth noting that CoCon pretrained with UCF101 outperforms CPC trained on Kinetics400 even though CoCon on UCF101 uses only around10% data compared to Kinetics. Further demonstrating thepotential of utilizing multiple views as opposed to trainingwith larger and diverse datasets.When comparing the Random baseline and CoConpre-trained on Kinetics400, we observe higher performance gains for RGB ( 25.4%) compared to Optical-Flow( 6.9%). We argue this is due to higher variance and complexity of RGB compared to Flow, allowing a randomlyinitialized network to perform relatively better with Flow.While comparing our approach with CPC, we again observehigher gains in RGB ( 4.1%) compared to Flow ( 2.7%).This can be explained by the potential capability of RGB tocapture flow-like features when learned jointly.Effect of cooperative training We compare benefitsof cooperative training with varying views. We look atco-training of RGB, Flow, SegMasks and PoseHMs. Recall that these additional views are generated using off-theshelf models without any additional post-processing. Eventhough they are somewhat redundant i.e. Flow, PoseHM,SegMasks are actually derived from RGB Images; usingthem simultaneously still leads to a large performance increase. We also note that although SegMasks and PoseHMsare sparse low-dimensional features, they still help improveperformance across all views.Table 3 summarizes downstream action recognition performance of each view under different approaches. Wesee improved performance with increase in the numberof views used. Consistent gains for views such as Flow,SegMasks, PoseHM, which are not as expressive as RGBpoints towards extraction of higher-order features even fromlow dimensional inputs. We observe PoseHM and SegMask have lower performance gains when evaluated onHMDB51. This can be attributed to the large degree ofnoise in PoseHMs and SegMasks for HMDB51. HMDBis a challenging and diverse dataset, leading to poor predictions from our off-the-shelf detector. In conclusion, thebenefits of joint training are apparent as CoCon leads to aperformance improvement for all the views involved.1 Optical Flow used for Kinetics400 is Farneback Flow; as opposed toTVL1 Flow for UCF101 and HMDB51. This difference in pre-trainingand fine-tuning modalities leads to less than expected performance gains.Effect of additional views CoCon hinges on the assumption that multi-view information helps in improvingoverall representation quality. To verify our hypothesis,we study co-training with different number of views. Weconsider two scenarios, 1) Joint training of RGB and Flowstreams, and 2) Joint training of RGB, Flow, SegMasks andPoseHMs. Table 5 shows a consistent increase across viewswhen increasing the number of views used during training.We should note that both SegMasks and PoseHMs containsignificant noise as the off-the-shelf models incorrectly detects and misses humans in numerous videos. However, wesee a consistent mutual increase in performance for all theinvolved views despite the prevalence of noise.Comparison with comparable approaches We summarize comparisons of CoCon with comparable state-ofthe-art approaches in Table 6. CoCon-Ensemble refers to anensemble of models for all the involved views. We observea few major trends, (1) When pre-training on UCF101, using multiple views allows us to outperform the nearest comparable approach by around 10.4%. This demonstrates thepotential of cooperatively utilizing multiple views to learnrepresentations. (2) We see considerable gains while training on Kinetics400 as well, however, the increase is smallercompared to UCF101. We argue the reasons are, a) we onlyutilize two views for co-training. b) the flow we utilize forKinetics400 is Farneback Flow instead of TVL1 flow usedfor UCF101 and HMDB51. (3) Our method comfortablyoutperforms recent multi-view approaches consistently onUCF101 and HMDB51. (4) An interesting observation isthat using multiple views of a small dataset (UCF101) performs better (71.0%) than pre-training on a large dataset,Kinetics400 (68.2%). This suggests that utilizing differentviews can be better than merely training on larger datasets.Comparison with recent approaches A few very recent approaches [1, 11, 12, 32] have tackled multi-modalself-supervised achieving impressive performance. CoCondiffers from them as it considers inter-instance relationshipsto aid learning in addition to relationships between views.Due to resource constraints, it was not possible to have a faircomparison due to the significant difference in the amountof GPUs, number of epochs trained and the backbones used.However, we hope our carefully constructed experimentsgiven earlier provide deeper insights into CoCon’s benefitseven with lower resource requirements.4.2. Qualitative ResultsWe motivate CoCon arguing about the benefits of preserving similarities across view-specific spaces. We observerespecting structure across views results in emergence ofhigher-order semantics without additional supervision e.g.sensible class relationships and good feature representations. Jointly training with views known to perform well forvideo action understanding allows us to learn good video

MethodResolutionBackbone# ViewsPre-trainUCF101HMDB51Random InitializationImageNet [36]128 128224 ffle and Learn [28]OPN [23]DPC [10]VGAN [42]LT-Motion [26]Cross and Learn [35]Geometry [8]CMC [40]CoCon - RGBCoCon - Ensemble227 22780 80128 128N/AN/A224 224N/A128 128128 128128 128CaffeNetVGG-M-2048ResNet18C3DRNN 2344UCF-HMDBUCF-HMDBUCF101Flickr 452.03D-RotNet [18]DPC [10]CoCon - RGBCoCon - Ensemble112 112128 128128 128128 2.0ST-Puzzle [20]DPC [10]CoCon - RGBCoCon - Ensemble224 224224 224224 224224 3.1Table 6: Comparison of classification accuracies on UCF101 and HMDB51, averaged over all splits.

Related Documents:

Parametric Contrastive Learning

parametric contrastive learning from previous ones, we treat the InfoNCE as a non-parametric contrastive loss following [54]. Chen et al. [9] used self-supervised contrastive learn-ing SimCLR to first match the performance of a super-vised ResNet-50 with only a linear classifier trained on self-supervised representation on full ImageNet. He et .

6 Views

1y ago

A Phonological Contrastive Analysis of Kurdish and English

Keywords: Phonology, Sound system, Contrastive analysis, Kurdish language 1. Introduction Contrastive phonology is ‘the process of comparing and contrasting the phonological systems of languages to formulate their similarities and differences (Yarmohammadi; 1995:19). It is in the area of phonology that as

23 Views

3y ago

Senior COOPERATIVE HOUSING

This newsletter is sponsored by Cooperative Network and the Senior Cooperative Foundation. SCF SENIOR COOPERATIVE FOUNDATION Prepared quarterly by Cooperative Network's Senior Cooperative Housing Council and distributed via U.S. mail and email as a service to member housing cooperatives. Cooperative Network 145 University Ave. W., Suite 450

27 Views

1y ago

Effect of Kagan Cooperative Learning Structures on Learning Achievement ...

cooperative learning has been established as a promising strategy in classroom pedagogy (Johnson & Johnson, 1999; Dotson, 2001; Kagan & Kagan, 2009; Farmer, 2017). There are various types of cooperative learning strategies practiced over the decades. Kagan Cooperative learning structures (KCLS) are one of many cooperative learning

10 Views

3m ago

Pollock Conservation Cooperative - North Pacific Fishery ...

cooperative. On January 21, 1999, the PCC and HSCC completed an inter-cooperative agreement to facilitate efficient management and accurate accounting between the two cooperatives. The agreement, “Cooperative Agreement Between Offshore Pollock Catchers’ Cooperative and Pollock Conservation Cooperative” remains

45 Views

3y ago

Understanding Self-Supervised Learning Dynamics without Contrastive Pairs

the nature of the learned representations? How do multiple design choices and hyperparameters interact nonlinearly in the learning dynamics? While there are interesting theo-retical studies of contrastive SSL (Arora et al., 2019; Lee et al., 2020; Tosh et al., 2020), any theoretical understand-ing of the nonlinear learning dynamics of non .

11 Views

1y ago

Unsupervised Feature Extraction by Time-Contrastive Learning and ...

2 Time-contrastive learning TCL is a method to train a feature extractor by using a multinomial logistic regression (MLR) classiﬁer which aims to discriminate all segments (time windows) in a time series, given the segment indices as the labels of the data points. In more detail, TCL proceeds as follows: 1. Divide a multivariate time series x

10 Views

1y ago

Absolute Beginners - Learn Piano | Learn Keyboard | Piano ...

Reading music from scratch; Easy, effective finger exercises which require minimal reading ability; Important musical symbols; Your first tunes; Audio links for all tunes and exercises; Key signatures and transposition; Pre scale exercises; Major and minor scales in keyboard and notation view; Chord construction; Chord fingering; Chord charts in keyboard view; Arpeggios in keyboard and .

88 Views

3y ago

Recent Views

TENTH EDITION self-therapy for the stutterer

Stuttering Foundation of America self-therapy for the stutterer TENTH EDITION THE STUTTERING FOUNDATION PUBLICATION NO. 0012 self-therapy for the stutterer Publication No. 0012 First Edition—1978 Tenth Edition—2002 Revised Tenth Edition—2007 Published by Stuttering Foundation of America 3100 Walnut Grove Road, Suite 603 P.O. Box 11749 Memphis, Tennessee 38111-0749 Library of Congress .

3y ago

40 Views

Supply Chain Management: An International Journal

The organization is a partner of the Committee on Publication Ethics (COPE) and also works with Portico and the LOCKSS initiative for digital archive preservation. *Related content and download information correct at time of download. Downloaded by University of Nottingham At 06:12 31 October 2018 (PT) Modern slavery challenges to supply chain management Stefan Gold International Centre for .

3y ago

29 Views

Operation London Bridge - Fremington Parish Council

OPERATION LONDON BRIDGE . 1 CONTENTS Page 2 – 1. Introduction Page 3 – 2. Protocol Page 3 – 2.1 Implementation of Protocol Page 3 – 3. Flag Flying Page 3 – 4. Proclamation Day Schedule Page 4 – 4.1 Proclamation Day Page 4 – 4.2 Proclamation Day Protocol Page 5 – 5. Books of Condolence Page 6 – 5.1 Online Book of Condolence Page 6 – 6. Events During the Period of Mourning .

3y ago

62 Views

A CONTINUUM OF QUALITY: ON FIRE

ASTM D 5132 BSS 7230 MODEL 701-S MODEL 701-S-X (export) MODEL VC-1 MODEL VC-1-X (export) MODEL VC-2 MODEL VC-2-X (export) MODEL HC-1 MODEL HC-1-X (export) MODEL HC-2 MODEL HC-2-X (export) FAA Listed TM. FAA MULTI-PURPOSE SMALL SCALE FLAMMABILITY TESTER SPECIFICATIONS: FAR Part 25 Appendix F Part I (Vertical, Horizontal, 45 and 60 ) DRAPERY FLAMMABILITY The most widely cited .

3y ago

80 Views

Combustion Analysis of Nanoenergetic Materials

Osci 1 05 10 15 P a [MPa] Acc Osci. NEEM MURI Temperature Measurements for understanding Gas Generation Previous work: gas fraction at equilibrium Drawbacks: No intermediate gases (not present at equilibrium) nAl/MoO 3 30 Many of the equilibrium gases will not be realized until very high temperatures (ex. Cu: BP of 2835K) nAl/CuO in burn tube at 10 20 e ssure [MPa] 1atm in air nAl/MoO .

3y ago

37 Views

Wiring and testing electrical equipment and circuits

circuits to occur, strain on terminations, insufficient slack cable at terminations, continuity and polarity checks, insulation checks) K21 the care, handling and application of electrical test and measuring instruments (such as multimeter, insulation resistance tester, loop impedance test instruments) K22 applying approved test procedures; the safe working practices and procedures required .

3y ago

46 Views

GRID DIP METER DESIGN - makearadio

circuits). 2. Rough frequency and harmonic measurements 3. AM signal monitor receiver. 4. Simple RF signal generator including AM modulation if required. 5. Crystal Testing. 6. Use as a BFO for SSB and CW reception 7. Measurement of unknown capacitors and inductors I decided to include some extra features above the normal in functionality RF output from the oscillator enabling use of an .

3y ago

208 Views

OPHTHALMOLOGY GOALS AND OBJECTIVES

The objectives of Ophthalmology Residency Program are to: 1. Provide residents with a strong scientific understanding of the fundamentals of ophthalmology through a combination of mentoring and didactic education. 2. Provide residents with clinical skills in all subspecialties of ophthalmology. 3.

3y ago

60 Views

History of Computers

An analog computer does not store information digitally Values are stored as voltage levels Analog computers are particularly useful solving nonlinear simultaneous differential equations An electric circuit can be defined by an equation. An analog computer is programmed by creating a circuit that follows a desired equation.

3y ago

37 Views

Risk Management and Corporate Governance - OECD

Corporate Governance Risk Management and Corporate Governance Volume 2011/Number of issue,Year of edition Author (affiliation or title), Editor Tagline Groupe de travail/Programme (ligne avec top à 220 mm)

3y ago

66 Views

RF Design and Test Using MATLAB and NI Tools

RF Design and Test Using MATLAB and NI Tools . Antenna array, RF, and digital signal processing cannot be designed separately! – Large communication bandwidth digital signal processing is challenging – High-throughput DSP linearity requirements imposed over large bandwidth

3y ago

87 Views

Digital Signal Processing - Webspaces - Accueil

J.-P. Delmas et al. / Digital Signal Processing 95 (2019) 102579. lower far-ﬁeld DOA CRB. Furthermore, thanks to the decoupling be-tween the DOA and range parameters to the second-order w.r.t. the inverse of the range in the Fisher information matrix, the deriva-tion of closed-form approximate expressions of the CRB is greatly simpliﬁed.

3y ago

23 Views

History of U.S. Children’s Policy, 1900-Present

Social dislocations of the late 19th century, sparked by rapid industrialization, population growth, urbanization, and immigration, together with the economic crises of the late 1870s and 1890s, led to social reform movements in the 1890s and during the Progressive Era at the beginning of the 20th century. With respect to children, many reformers

3y ago

53 Views

EDUKASYONG PANGKATAWAN 5 Lesson Exemplars Karapatang Ari .

nakasaad sa ilalim ng makabagong kurikulum, ang K to 12 Currriculum. Layunin nito na mabigyan ng sapat na kaalaman at pagpapahalaga sa mga gawaing may kinalaman sa pagpapaunlad ng pangangatawan. Sa paghahanda ng mga aralin na nakapaloob sa exemplar na ito, isinasaalang-alang ang mga sumusunod na pangunahing kaisipan:

3y ago

99 Views

ELECTRICAL ENGINEERING GRADUATE

Electrical Engineering, or is not equivalent to the BSEE degree offered by Cal State LA, we may require you to complete certain prerequisite courses before being admitted to our program. These will normally be 300level courses, though the list mig0- ht contain a number of 2 or 400000-0-

3y ago

30 Views

CoCon: Cooperative-Contrastive Learning

It looks like you're using an ad-blocker