SGCN: Sparse Graph Convolution Network For Pedestrian Trajectory Prediction

1y ago
5 Views
1 Downloads
1.73 MB
10 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Braxton Mach
Transcription

SGCN:Sparse Graph Convolution Network for Pedestrian Trajectory PredictionLiushuai Shi1 Le Wang2 * Chengjiang Long3 Sanping Zhou2 Mo Zhou2 Zhenxing Niu4 Gang Hua51School of Software Engineering, Xi’an Jiaotong University2Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University3JD Finance America Corporation, 4 Machine Intelligence Lab, Alibaba Group 5 Wormpex AI ResearchAbstractPedestrian trajectory prediction is a key technology inautopilot, which remains to be very challenging due to complex interactions between pedestrians. However, previousworks based on dense undirected interaction suffer frommodeling superfluous interactions and neglect of trajectorymotion tendency, and thus inevitably result in a considerabledeviance from the reality. To cope with these issues, wepresent a Sparse Graph Convolution Network (SGCN) forpedestrian trajectory prediction. Specifically, the SGCN explicitly models the sparse directed interaction with a sparsedirected spatial graph to capture adaptive interaction pedestrians. Meanwhile, we use a sparse directed temporal graphto model the motion tendency, thus to facilitate the predictionbased on the observed direction. Finally, parameters of abi-Gaussian distribution for trajectory prediction are estimated by fusing the above two sparse graphs. We evaluateour proposed method on the ETH and UCY datasets, and theexperimental results show our method outperforms comparative state-of-the-art methods by 9% in Average DisplacementError (ADE) and 13% in Final Displacement Error (FDE).Notably, visualizations indicate that our method can captureadaptive interactions between pedestrians and their effectivemotion tendencies.1. IntroductionGiven the observed trajectories of pedestrians, pedestriantrajectory prediction aims to predict a sequence of futurelocation coordinates of pedestrians, which plays a criticalrole in various applications like autonomous driving [3, 29],video surveillance [28, 45] and visual recognition [9, 27, 16].Despite the recent advances in the literature, pedestrian trajectory prediction remains to be a very challenging task due to the complex interactions between pedestrians. For example, the motion of a pedestrian is veryeasy to be disturbed by other pedestrians [11], close friends Correspondingauthor.(A.1) Dense UndirectedInteractionReal TrajectoryScenarioObserved Trajectory(B.1) PredictionDiscrepancyPedestrian(A.2) Sparse UndirectedInteraction(A.3) Sparse DirectedInteraction(B.2) Alleviating Discrepancy (B.3) Versatile Motionwith Motion TendencyTendenciesPredicted TrajectoryGround Truth(A B) CombinationMotion TendencyFigure 1. Sparse Directed Interaction & Motion Tendency. Different pedestrians are marked in different colors. (A.1) Denseundirected interaction, where any pedestrian interacts with all otherpedestrians. (A.2) Sparse undirected interaction with superfluousinteractions being removed. (A.3) Sparse directed interaction withadaptive interaction pedestrians. (B.1) The predicted trajectoryseverely deviates from the ground-truth as the pedestrians try toavoid collision against each other. (B.2) Trajectory points enclosedby the blue dotted circle indicate a motion tendency which maybe leveraged for trajectory prediction. (B.3) Variation of motiontendencies with different sets of trajectory points.or colleagues are likely to walk in groups [32], and different pedestrians usually conduct similar social actions [38].To model the interactions between pedestrians, extensiveworks [31, 2, 11, 23, 19, 32, 46] have been done inthe past few years, in which the weighting-by-distancemethods [31, 2, 11, 32] and the attention-based methods [23, 19, 46, 8, 17, 18] have achieved the state-of-the-artresults in pedestrian trajectory prediction.Most of the weighting-by-distance and attention-basedmethods take a dense interaction model to represent the complex interactions between pedestrians, in which they assumethat a pedestrian interacts with all the rest pedestrians. Besides, the weighting-by-distance methods apply the relativedistance to model the undirected interaction, in which the interaction between two pedestrians are identical to each other.However, we argue that both the dense interaction and undirected interaction will introduce the superfluous interactionsbetween pedestrians. As shown in Figure 1: (1) two pairs of8994

pedestrians head towards from the opposite direction, whileonly the trajectory of red pedestrian detours to avoid the collision with green pedestrian; and (2) the trajectories of blueand yellow pedestrians not influence each other. It is obviousthat the dense or sparse undirected interaction based methodswill fail to deal with the interactions in this case. For example, the dense undirected interaction, as represented by A.1,will generate superfluous interactions between yellow andblue pedestrians, due to the trajectories of yellow and bluepedestrians do not influence each other. Besides, the sparseundirected interaction, as denoted in A.2, generates the superfluous interactions between the green and red pedestrians,because the red pedestrian detours to avoid collision with thegreen pedestrian, while the green pedestrian walks straightforward. To solve the above problems, it’s better to designa Sparse Directed Interaction, as shown in A.3, which caninteract with the adaptive pedestrians in the prediction ofpedestrian trajectory.What’s worse, previous works focus on collision avoidance, which leads to the predicted trajectories tend to generate detour trajectories to avoid the collision for green and redpedestrians, as indicated in B.1, while the green pedestriandeviates from the ground truth. In this case, we proposemotion tendency, which is represented by a short-term trajectory enclosed by the blue dotted circle as shown in B.2,the trajectory direction of the green pedestrian is straightforward, and that of the red pedestrian deflects to avoid thecollision with the green pedestrian. Based on the assumptionthat the direction of a trajectory will not change too abruptly,the motion tendency is beneficial to the prediction for greenpedestrian. It should be noted that the motion tendency isversatile, as shown in B.3, in which the last one performsbetter than others, because it can jointly capture the “straightforward” and “temporary deviation” tendencies. Once theeffective set of intermediate points can be found, the motiontendency will facilitate pedestrian trajectory prediction.In this paper, we present a novel Sparse Graph Convolution Network (SGCN) which combines the Sparse DirectedInteraction and Motion Tendency for pedestrian trajectoryprediction. As shown in Figure 1 (A B), the Sparse DirectedInteraction discover the set of pedestrians that effectivelyinfluence the trajectory of a particular pedestrian, and theMotion Tendencies improve the future trajectory of interacted pedestrians. In particular, as shown in Figure 2, theSparse Directed Spatial graph and Sparse Directed Temporal graph are jointly learned to model the Sparse DirectedInteraction and the Motion Tendency of trajectory. Specifically, the Sparse Graph Learning, as illustrated in Figure 3,leverages self-attention [40] mechanism to learn the asymmetric dense and directed interaction scores between trajectory points. Then, these interaction scores are fused and fedinto asymmetric convolutional networks to obtain high-levelinteraction features. Finally, a sparse directed spatial and asparse directed temporal adjacency matrix can be obtainedafter pruning the superfluous interactions using a constantthreshold and a normalization step of our “Zero-Softmax”function. The final asymmetric normalized sparse directedadjacency matrices can represent the sparse directed graph.Once the above two graphs are obtained, we further learnthe trajectory representation by a cascade of Graph Convolution Networks [22], and employ the Time ConvolutionNetwork [4] to estimate the parameters of the bi-Gaussiandistribution, which are used to generate the predicted trajectories.Extensive experimetal results on the ETH [34] andUCY [24] datasets show that our method outperforms all thecomparison state-of-the-art works.To our best knowledge, this is the first work that explicitlymodels the Sparse Directed Interaction and Motion Tendency.In summary, our contributions are three-fold: (1) we proposeto model the Sparse Directed Interaction and Motion Tendency to improve the predicted trajectories; (2) we design anadaptive method to model the Sparse Directed Interactionand Motion Tendency; and (3) we propose a sparse graphconvolution network to learn the trajectory representations,where the advantage of explicit sparsity is demonstrated bythe experiments.2. Related WorksPedestrian Trajectory Prediction. Thanks to its powerful representational ability, deep learning becomes increasingly prevalent for predicting the pedestrian trajectories. Social-LSTM [1] models the trajectory of each pedestrian with Recurrent Neural Networks (RNNs) [14, 20, 6],and computes the interaction between pedestrians within acertain radius from the pooled hidden states. SGAN [11]predicts multi-modal trajectory using the Generative Adversarial Network (GAN) [10, 48, 5], and proposes a newpooling mechanism to compute interactions based on relative distance between pedestrians. TPHT [30] representseach pedestrian by an LSTM and employs a soft-attentionmechanism [42] to model interactions between pedestrians.Moreover, subsequent works leverage the scene featuresto improve the prediction accuracy. PITF [26] considersthe human-scene interaction and human-object interaction.Sophine [37] extracts scene features and social features by atwo-way attention mechanism, and computes the weights forall agents with a social-attention. TGFP [25] predicts bothcoarse and fine locations by using scene information.Since the graph structure can better fit the scene, anothertrack of works model the human-human interaction usinggraph. Social-BiGAT [23] models the trajectory of eachpedestrian using LSTM, and the interactions by the GraphAttention Network (GAT) [41]. To better represent the interaction between pedestrians, Social-STGCNN [32] directlymodels the trajectory as a graph, where the edges weighted8995

Sparse Graph LearningSparse Directed Spatial Graph(Sparse Directed Interaction)Adjacency MatrixTemporal GraphAdjacency MatrixInputsSpatial Temporal Sparse Graph Sparse Directed Temporal Graph(Motion Tendency)ReshapeSpatial ory PredictionTrajectory RepresentationْTCN۪Temporal Spatial Sparse Graph ConvolutionPredicted TrajectoriesFigure 2. The framework of our proposed SGCN. The trajectories are reformed as spatial and temporal graph inputs. Sparse Graph Learninginvolves the learning of sparse directed spatial graph representing the Sparse Directed Interaction and sparse directed temporal graphrepresenting the Motion Tendency from the graph inputs. Trajectory representations are learned by subsequent sparse spatial and temporalgraph convolution networks, and then fed into a TCN to estimate the parameters of the bi-Gaussian distribution for future trajectory pointprediction.by the pedestrian relative distance represent interactionsbetween pedestrians. RSGB [38] notes there are strong interactions between some distant pedestrian pairs, hence invitessociologists to manually divide the pedestrians into differentgroups according to specific physical rules and sociological actions. STAR [46] models the spatial interaction andtemporal dependencies by the Transformer [40] framework.In brief, previous works model the interactions for eitherthe neighborhood within a fixed physical range, or unexceptionally all pedestrians. Presumably, this may result in discrepancies on the predictions due to superfluous interactions.In contrast, we propose a Sparse Directed Interaction, whichis capable of finding the adaptive pedestrians involved inthe interaction, thus to alleviate such problem. Besides, ourmethod also captures the effective Motion Tendency, whichis helpful to improve the accuracy of predicted trajectory.Graph Convolution Networks. Graph convolution networks (GCNs) are suitable for handling non-Euclidean data.The existing GCN models can be divided into two categories: 1) the spectrum domain GCNs [22, 7] design theconvolution operation based on Graph Fourier Transform.It requires the adjacency matrix to be symmetric due to theeigen decomposition of Laplacian matrix; 2) the GCNs inspatial domain directly conduct convolution on the edge,which is applicable on asymmetric adjacency matrices. Forexample, GraphSage [12] aggregates the nodes in three different ways and fuses adjacent nodes in different orders toextract node features. GAT [41] models the interaction between nodes using an attention mechanism. In order to dealwith the spatio-temporal data, STGCN [43] extends the spatial GCN to spatio-temporal GCN for skeleton-based actionrecognition, which aggregates the nodes from a local spatiotemporal scope. Our SGCN differs from all the above GCNs,since it aggregates the nodes based on a learned sparse adja-cency matrix, which means the set of nodes to be aggregatedis dynamically determined.Self-Attention Mechanism. The core idea of the Transformer [40], i.e., self-attention, has been demonstrated successfully in place of RNNs [20, 6] on a series of sequencemodeling tasks in natural language processing, such as textgeneration [44], machine translation [35], etc. Self-attentiondecouples the attention into the query, key and value whichcan capture long-range dependencies, and takes advantageof parallel computation compared with RNNs. To representthe relationship between every pair of elements of the input sequence, self-attention computes attention scores by amatrix multiplication between the query and key.In our method, we only compute a single layer attention scores to model Sparse Directed Interaction and Motion Tendency. Compared to the most recent work [46],which predicts future trajectories by stacking Transformerblock (computation and memory expensive [15]), ourmethod is parameter-efficient and achieves better performance.3. Our MethodPedestrian trajectory prediction aims to predict future location coordinates of pedestrians. Given a series of observedvideo frames over time t {1, 2, . . . , Tobs }, we can obtainthe spatial (2D-Cartesian) coordinates {(xnt , ytn )}Nn 1 of allpedestrians with a tracking algorithm. Based on these trajectories, our objective is to predict the pedestrian coordinateswithin a future time t {Tobs 1, Tobs 2, . . . , Tpred }.As discussed above, the existing works suffer from superfluous interactions by dense undirected graphs. Meanwhile,they also neglect the exploitable Motion Tendency clue. Tomitigate these limitations, we propose a Sparse Graph Convolutional Network (SGCN) for trajectory prediction, which8996

Self-attentionSpatial-Temporal �ଶ QueryConcatenate �݁ேୀଷSigmoid்ୀଵ݁ேୀଶ EmbeddingTemporal GraphInputsSpatial Normalized SparseAdjacency MatrixTemporal-Spatial metricConvolutionNetworkSpatial Dense Interaction ScoreSelf-attention்ୀଶ݁ேୀଵͳ ൈ ͳ ConvZeroSoftmax݁ேୀଶSigmoidEmbeddingSpatial �݁ ்ୀଵ்ୀଵ ேୀଶTemporal Normalized SparseAdjacency MatrixTemporal Dense Interaction ScoreFigure 3. Sparse Graph Learning. The self-attention generates the dense spatial interaction scores and dense temporal interaction scoresbased on the spatial and temporal graph inputs, respectively. Subsequent spatial-temporal fusion of the spatial interaction scores of each timestep and the temporal interaction scores of each pedestrian are done by 1 1 convolution layers and self-attention mechanism. The sparseadjacency matrices are computed by asymmetric convolution networks.mainly involves Sparse Graph Learning and bi-Gaussian distribution parameter estimation based on the trajectory representations. The overall architecture of the proposed networkis represented in Figure 2. First, the Sparse Directed Interaction (SDI) and Motion Tendency (MT) are learned from thespatial and temporal graph inputs using self-attention mechanism and asymmetric convolution networks, respectively.Then, subsequent sparse spatial and temporal Graph Convolution Networks extract the interaction and tendency featuresfrom the asymmetric adjacency matrices representing sparsedirected spatial graph (i.e., SDI) and sparse directed temporalgraph (i.e., MT). Finally, the learned trajectory representations are fed into a Time Convolution Network (TCN) topredict the parameters of a bi-Gaussian distribution, whichgenerates the predicted trajectory.3.1. Sparse Graph LearningGraph Inputs.Given input trajectories Xin RTobs N D , where D denotes the dimension of spatial coordinate, we construct a spatial graph and a temporal graphas illustrated in Figure 3. The spatial graph Gspa (V t , U t )at time step t represents locations of pedestrians, while temporal graph Gtmp (Vn , Un ) for pedestrian n representsthe corresponding trajectory. V t {vnt n 1, . . . , N }and Vn {vnt t 1, ., Tobs } represent nodes of Gspaand Gtmp , respectively, and the attribute of vnt is the coordinate (xtn , ynt ) of the n-th pedestrian at time step t.U t {uti,j i, j 1, . . . , N } and Un {uk,qn k, q 1, . . . , Tobs } represent edges of Gspa and Gtmp , respectively,t twhere uti,j , uk,qn {0, 1} indicate whether the nodes vi , vjk qor nodes vn , vn are connected (denoted as 1) or disconnected (denoted as 0), respectively. Since there is no priorknowledge on the connections of nodes, the elements in Unare initialized as 1, while U t is initialized as upper triangularmatrix filled with 1 because of the temporal dependency,namely the current state is independent to future states.Sparse Directed Spatial Graph. To increase the sparsityof the spatial graph inputs, i.e., identify the exact set ofpedestrians involved in interactions in the spatial graph, wefirst adopt the self-attention mechanism [40] to compute theasymmetric attention score matrix, namely the dense spatialinteraction Rspa RN N between pedestrians, as follows:Espa φ(Gspa , WEspa ),Qspa φ(Espa , WQspa ),spa),Kspa φ(Espa , WK(1)TQspa KspaRspa Softmax( ),dspawhere φ(·, ·) denotes linear transformation, Espa are thegraph embeddings, Qspa and Kspa are the query and key ofspathe self-attention mechanism, respectively. WEspa RD DE ,spaspaspaWQspa RD DQ , WK RD DK are weights of the linear spatransformations, and dspa DQis a scaled factor [40]to ensure numerical stability.Since Rspa is computed at every time step independently,it does not contain any temporal dependency informationof the trajectories. Hence, we stack the dense interactionss-tRspa from every time step as Rspa RTobs N N , and thenfuse these stacked interactions with 1 1 convolution alongthe temporal channel, resulting in spatial-temporal denses-tinteractions R̂spa RTobs N N .s-tA slice of R̂spaat each time step is an asymmetric squarematrix, where its (i, j)-th element represents the influenceof node i to node j. Then, the initiative and passive relations represented in the rows and columns of the matrixrespectively can be combined to obtain high-level interactionfeatures. Specifically, a cascade of asymmetric convolution8997

s-t ,kernels [39] are applied on the rows and columns of Rˆsparespectively, i.e., (l)row Conv F (l 1) , K(1 S),Frow (l)colFcol Conv F (l 1) , K(S 1),(2) (l)(l)F (l) δ Frow Fcol ,(l)(l)where Frow and Fcol are the row-based and column-basedasymmetric convolution feature maps at the l-th layer, respectively, F (l) is the activated feature map, and δ(·) denotesrowcola non-linear activation function. K(1 S)and K(S 1)are theconvolution kernels of sizes (1 S) and (S 1) (i.e., rowand column vectors), respectively. Note, F (0) is initializeds-t , and all the convolution operations are padded withas Rˆspazeros in order to keep the output size as same as the inputsize. Thus, the activated feature map obtained from the lastconvolution layer is the high-level interaction feature Fspa ofsize (Tobs N N ).We proceed to generate the sparse interaction maskMspa by element-wise threshold on σ (Fspa ) with a hyperparameter ξ [0, 1]. When Fspa [i, j] ξ, the (i, j)-thelement of Mspa is set to 1, otherwise 0, i.e.,Mspa I {σ (Fspa ) ξ} ,(3)where I{·} is the indicator function, which outputs 1 ifthe corresponding inequality holds, otherwise 0. The σ isSigmoid activation function. To ensure the nodes are selfconnected, we add an identity matrix I to the interactionmask, and then fuse it with the spatial-temporal dense inst by element-wise multiplication, resulting in ateraction Rˆspasparse adjacency matrix Aspa , i.e.,s-t ,Aspa (Mspa I) Rˆspa(4)where denotes element-wise multiplication.Some previous works (e.g., [22]) suggest the normalization of adjacency matrix is essential for GCN to functionproperly. Nevertheless, the related works in the vertex domain directly adopt Softmax function for adjacency matrixnormalization, which leads to a side-effect that the sparsematrix will be back to dense matrix because Softmax outputsnon-zero values for zero inputs. In this case, the pedestriansthat do not interact with each other are forced to interactwith each other again. To avoid this problem, we designa “Zero-Softmax“ function to to keep the sparsity and theexperimental results of ablation study represent the “ZeroSoftmax“ can further improve the performance. Specifically,given a flattened matrix x [x1 , x2 , . . . , xD ],Zero-Softmax(xi ) Dj(exp(xi ) 1)2(exp(xj ) 1)2 ǫ,(5)where ǫ is a neglectable small constant to ensure numericalstability, and D is the dimension of the input vector. Uponthis, we can obtain the normalized sparse adjacency matrix Âspa Zero-Softmax(Aspa ). Thus, a spatial-temporalsparse directed graph Ĝspa (V t , Âspa ) representing theSparse Directed Interactions is eventually obtained from thespatial graph inputs. The whole process is illustrated inFigure 3.Sparse Directed Temporal Graph. Following a similar way with the sparse directed spatial graph, we can alsoobtain the effective Motion Tendency, namely the normalized adjacency matrix Âtmp from the temporal graph inputs,except for two differences.First, a position encoding tensor E [40] is added to Etmp ,i.e., Etmp φ(Gtmp , WEtmp ) E, because trajectory pointsin different order indicate different Motion Tendencies. Notably, the dense temporal interaction Rtmp is also an uppertriangular matrix like U t due to temporal dependency.The second difference lies in the temporal-spatial fusionstep as illustrated in Figure 3, where we can not performt-s RN Tobs Tobs obtained by stackingconvolution on RtmpTobs TobsRtmp R, because the number of pedestrians Nis variable for different scenes. To simplify operation, wet-sdirectly view the Rtmpas the temporal-spatial dense interaction.Thus, we eventually obtain a temporal-spatial sparse directed graph Ĝtmp (Vn , Âtmp ) representing the MotionTendency from the temporal graph inputs.3.2. Trajectory Representation and PredictionGCNs can aggregate the nodes of sparse graphs representing Âspa (SDI) and Âtmp (MT), and learn the trajectoryrepresentation. As illustrated in Figure 2, we use two GCNsto learn the trajectory representation, where in one branchÂspa is fed to the network ahead of Âtmp , while in the otherbranch they are fed in the reverse order. Thus, the first branchproduces interaction-tendency feature HITF , while the otherbranch produces tendency-interaction feature HTIF , i.e., (l 1)(l)(l)(l)HITF δ Âtmp · δ(Âspa HITF Wspa1 )Wtmp1 , (l 1)(l)(l)(l)HTIF δ Âspa · δ(Âtmp HTIF Wtmp2 )Wspa2 ,(6)where Wtmp1 ,Wspa1 , Wtmp2 and Wspa2 are GCN weights,(0)and l represents the l-th layer of GCN. HITF is initialized(0)as Ĝspa , and HTIF is initialized as Ĝtmp . The trajectory representation H is the sum of the last GCN outputs HITF andHTIF .Trajectory Prediction and Loss Function. We followSocial-LSTM [1] to assume that the trajectory coordinates(xtn , ynt ) at time step t of pedestrian n follow a bi-variateGaussian distribution N (μ̂tn , σ̂nt , ρ̂tn ), where μ̂tn is the mean,8998

ModelYearETHHOTELUNIVZARA1ZARA2AVGVanilla LSTM [1]Social LSTM [1]SGAN [11]Sophie [37]PITF [26]GAT [23]Social-BIGAT [23]Social-STGCNN [32]RSBG w/o context [38]STAR [46]SGCN /1.000.44/0.750.48/0.990.41/0.870.37/0.65Table 1. Comparison with the baselines approach on the public benchmark dataset ETH and UCY for ADE/FDE. All approaches input 8frames and output 12 frames. Our SGCN significantly outperform the comparison state-of-the-art works. The lower the better.σ̂nt is the standard deviation, and ρ̂tn is the correlation coefficient. Given the final trajectory representation H, wecan predict the parameters of the bi-Gaussian distributionwith a TCN [4] on the time dimension following SocialSTGCNN [32]. Note, TCN is chosen because it does notsuffer from gradient vanishing and high computational costlike traditional RNNs [14, 20, 6]. Hence, the method can betrained by minimizing the negative log-likelihood loss asLn (W) Tpred t Tobs 1 log P (xtn , ynt ) μ̂tn , σ̂nt , ρ̂tn , (7)where W denotes all trainable parameters in the method.4. Experiments and AnalysisEvaluation Datasets. To validate the efficacy of ourproposed method, we use two public pedestrian trajectorydatasets, i.e., ETH [34] and UCY [24], which are the mostwidely used benchmarks for the trajectory prediction task.In particular, ETH dataset contains the ETH and HOTELscenes, while the UCY dataset contains three different scenesincluding UNIV, ZARA1, and ZARA2. We use the “leaveone-out” [38] method for training and evaluation. We followexisting works that observing 8 frames (3.2 seconds) trajectories and predicting the next 12 frames (4.8 seconds).Evaluation Metrics. We employ two metrics, namelyAverage Displacement Error (ADE) [36] and Final Displacement Error (FDE) [1] to evaluate the prediction result. ADEmeasures the average L-2 distance between all the predictedtrajectory points obtained from the method and all groundtruth future trajectory points, while FDE measures the L-2distance between the final predicted destination obtainedfrom the method and final destination of the ground-truthfuture trajectory point.Experimental Settings. In our experiments, the embedding dimension of self-attention and the dimension of graphembedding are both set to 64. The number of self-attentionlayer is 1. The asymmetric convolution network comprises7 convolution layers with kernel size S 3. The spatialtemporal GCN and temporal-spatial GCN cascade 1 layer,respectively. And the TCN cascade 4 layers. The threshold value ξ is empirically set to 0.5. PRelu [13] is adoptedas the nonlinear activation δ(·). The proposed method istrained using the Adam [21] optimizer for 150 epochs withdata batches of size 128. The initial learning rate is set to0.001, which is decayed by a factor 0.1 with an intervalof 50 epochs. During the inference phase, 20 samples aredrawn from the learned bi-variate Gaussian distribution andthe closest sample to ground-truth is used to compute theADE and FDE metrics. Our method is implemented onPyTorch [33]. The code has been published† .4.1. Comparison with State-of-the-ArtsWe compare our method with nine state-of-the-artmethods, including Vanilla LSTM [1], Social-LSTM [1],SGAN [11], Sophie [37], PITF [26], Social-BiGAT [23],Social-STGCNN [32], RSGB [38], and STAR [47], in thepast four years. The results are shown in Table 1, whichare evaluated by using the ADE and FDE metrics. The results indicate that our method significantly outperforms allthe competing methods on both the ETH and UCY datasets.Especially for the ADE metric, our method surpasses theprevious best method STAR [47] by 9% averaging on ETHand UCY datasets. For the FDE metric, our method is betterthan the previous best method Social-STGCNN [32] by amargin of 13% averaging on the ETH and UCY datasets.To our best knowledge, the under-lying reason is that ourmethod can remove the interference from the superfluousinteractions by leveraging Sparse Directed Interaction, andthe Motion Tendency is leveraged to improve the prediction.Interestingly, our method outperforms all the dense interaction based methods, such as SGAN [11], Sophie [37],† code available at https : / / github . com / shuaishiliu /SGCN8999

VariantsETHHOTELUNIVZARA1ZARA2AVGSGCN w/o MTSGCN w/o ZSSGCN w/o SDISGCN 0.760.66/1.280.37/0.65Table 2. The ablation study of each components. SGCN (Ours) combines with each 1SGCN-V2SGCN-V3SGCN-V4SGCN .42/0.650.45/0.700.37/0.65Table 3. T

forward" and "temporary deviation" tendencies. Once the effective set of intermediate points can be found, the motion tendency will facilitate pedestrian trajectory prediction. In this paper, we present a novel Sparse Graph Convolu-tion Network (SGCN) which combines the Sparse Directed Interaction and Motion Tendency for pedestrian trajectory

Related Documents:

michigan’s wildlife action plan 2015-2025 sgcn distribution, status, habitats & threats appendix 3 2 of 183 introduction to sgcn summaries

“separable convolution” in deep learning frameworks such as TensorFlow and Keras, consists in a depthwise convolution, i.e. a spatial convolution performed independently over each channel of an input, followed by a pointwise convolution, i.e. a 1x1 convolution, projecting the channels o

The totality of these behaviors is the graph schema. Drawing a graph schema . The best way to represent a graph schema is, of course, a graph. This is how the graph schema looks for the classic Tinkerpop graph. Figure 2: Example graph schema shown as a property graph . The graph schema is pretty much a property-graph.

Oracle Database Spatial and Graph In-memory parallel graph analytics server (PGX) Load graph into memory for analysis . Command-line submission of graph queries Graph visualization tool APIs to update graph store : Graph Store In-Memory Graph Graph Analytics : Oracle Database Application : Shell, Zeppelin : Viz .

per, we propose a Context-adaptive Convolution Network (CaC-Net) to predict a spatially-varying feature weighting vector for each spatial loca-tion of the semantic feature maps. In CaC-Net, a set of context-adaptive convolution kernels are predicted from the global contextual information in a parameter-e cient manner. When used for convolution .

the methods described above use regular 3D convolution to process hyperspectral image, and there are many similar methods, such as [31] and [32]. Different from 2D convolution, a regular 3D convolution is performed by convoluting 3D kernel and feature map. It results in a significant increase in network parameters. Considering this shortcoming .

operations described in this section are for 2D-CNN, similar operations can also be performed for three-dimensional (3D)-CNN. Convolution layer A convolution layer is a fundamental component of the CNN architecture that performs feature extraction, which typically Convolution Convolution is a specialized type of linear operation used for

Araling Panlipunan. Ikalawang Markahan- Modyul 2: Mga Isyu sa Paggawa . II . Paunang Salita Ang Self-Learning Module o SLM na ito ay maingat na inihanda para sa ating mag-aaral sa kanilang pagaaral sa tahanan. Binubuo ito ng iba’t ibang bahagi na gagabay sa - kanila upang maunawaan ang bawat aralin at malinang ang mga kasanayang itinakda ng kurikulum. Ang modyul na ito ay may inilaang Gabay .