PHYSICAL REVIEW APPLIED 11, 064044 (2019) - Stanford EE

2y ago

26 Views

2 Downloads

4.10 MB

18 Pages

Last View : 1m ago

Last Download : 3m ago

Upload by : Dani Mulvey

Report this link

Download PDF

Transcription

PHYSICAL REVIEW APPLIED 11, 064044 (2019)Matrix Optimization on Universal Unitary Photonic DevicesSunil Pai,1,* Ben Bartlett,2 Olav Solgaard,1 and David A. B. Miller11Department of Electrical Engineering, Stanford University, Stanford, California 94305, USA2Department of Applied Physics, Stanford University, Stanford, California 94305, USA(Received 7 August 2018; revised manuscript received 18 April 2019; published 19 June 2019)Universal unitary photonic devices can apply arbitrary unitary transformations to a vector of inputmodes and provide a promising hardware platform for fast and energy-eﬃcient machine learning usinglight. We simulate the gradient-based optimization of random unitary matrices on universal photonicdevices composed of imperfect tunable interferometers. If device components are initialized uniform randomly, the locally interacting nature of the mesh components biases the optimization search space towardbanded unitary matrices, limiting convergence to random unitary matrices. We detail a procedure for initializing the device by sampling from the distribution of random unitary matrices and show that this greatlyimproves convergence speed. We also explore mesh architecture improvements such as adding extra tunable beam splitters or permuting waveguide layers to further improve the training speed and scalability ofthese devices.DOI: 10.1103/PhysRevApplied.11.064044I. INTRODUCTIONUniversal multiport interferometers are optical networksthat perform arbitrary unitary transformations on inputvectors of coherent light modes. Such devices can beused in applications including quantum computing (e.g.,boson sampling, photon walks) [1–4]; mode unscramblers [5]; photonic neural networks [6–8]; and ﬁndingoptimal channels through lossy scatterers [9]. While universal photonic devices have been experimentally realizedat a relatively small scale [5,6], commercial applicationssuch as hardware for energy-eﬃcient machine learning andsignal processing can beneﬁt from scaling the devices toup to N 1000 modes. At this scale, fabrication imperfections and components with scale-dependent sensitivitiescan negatively aﬀect performance.One canonical universal photonic device is the rectangular multiport interferometer mesh [10] shown in Fig. 1interfering N 8 modes. In multiport interferometers,an N -dimensional vector is represented by an array ofmodes arranged in N single-mode waveguides. A unitary operation is applied to the input vector by tuningMach-Zehnder interferometers (MZIs) represented by thered dots of Fig. 1. Each MZI is a two-port optical component made of two 50:50 beam splitters and two tunable single-mode phase shifters. Other mesh architectureshave been proposed, such as the triangular mesh [11](shown in Appendix C), the universal cascaded binary 4(18)architecture [12], and lattice architectures where light doesnot move in a forward-only direction [13–15].The scalability of optimizing mesh architectures, especially using gradient-based methods, is limited by theability of the locally interacting architecture to control theoutput powers in the mesh. If phase shifts in the mesh areinitialized uniform randomly, light propagates through thedevice in a manner similar to a random walk. The oﬀdiagonal, nonlocal elements of the implemented unitarymatrix tend to be close to zero because transitions betweeninputs and outputs that are far apart have fewer paths (e.g.,input 1 and output 8 in Fig. 1 have a single path). Theresulting mesh therefore implements a unitary matrix witha banded structure that is increasingly pronounced as thematrix size increases.In many applications such as machine learning [6] andquantum computing [2,16], we avoid this banded unitarymatrix behavior in favor of random unitary matrices. Arandom unitary matrix is achieved when the device phaseshifts follow a distribution derived from random matrixtheory [16–20]. In the random matrix theory model, weassign a sensitivity index to each component that increasestoward the center of the mesh, as shown in Fig. 1. Themore sensitive components toward the center of the meshrequire higher transmissivities and tighter optimizationtolerances. If the required tolerances are not met, theimplemented unitary matrix begins to show the undesiredbanded behavior.In Sec. II, we introduce the photonic mesh architectureand sources of error that can exacerbate the banded unitary matrix problem. In Sec. III, we explicitly model the064044-1 2019 American Physical Society

PAI, BARTLETT, SOLGAARD, and MILLERPHYS. REV. APPLIED 11, 064044 (2019)A. Photonic unitary implementationA single-mode phase shifter can perform an arbitraryU(1) transformation eiφ on its input. A phase-modulatedMZI with perfect (50:50) beam splitters can apply to itsinputs a unitary transformation U of the formU(θ, φ) : Rφ BRθ B 1 eiφ 0 1 i eiθ 0 1 0 1 i2 0 1 i 1 θθ eiφ coseiφ sin 22 , ieiθ /2 θθ sincos22FIG. 1. Mesh diagram representing the locally interacting rectangular mesh for N 8. The inputs (and single-mode phaseshifts at the inputs) are represented by blue triangles. Outputs arerepresented by purple squares. The MZI nodes are representedby red dots labeled with sensitivity index αn (e.g., α44 7 is themost sensitive node). The nodes represent the Givens rotationUn (in orange) at vertical layer (in green). Each photonic MZInode can be represented with 50:50 beam splitters B (red) andphase shifters Rθ , Rφ (orange), with required ranges 0 θ πand 0 φ 2π .component settings to implement a random unitary matrixand ultimately avoid the banded unitary matrix problem.We propose a “Haar initialization” procedure that allowslight to propagate uniformly to all outputs from any input.We use this procedure to initialize the gradient-based optimization of a photonic mesh to learn unknown randomunitary matrices given training data. We show that thisoptimization converges even in the presence of signiﬁcantsimulated fabrication errors.In Secs. IV and V, we propose and simulate twoalterations to the mesh architecture that further improvegradient-based optimization performance. First, we addredundant MZIs in the mesh to reduce convergence errorby up to 5 orders of magnitude. Second, we permutethe mesh interactions while maintaining the same numberof tunable components, which increases allowable tolerances of phase shifters, decreases oﬀ-diagonal errors, andimproves convergence time.We deﬁne the photonic mesh when operated perfectlyand then discuss how beam-splitter or phase-shift errorscan aﬀect device performance.(1)where B is the beam-splitter operator and Rθ , Rφ are upperphase-shift operators. Equation (1) is represented diagrammatically by the conﬁguration in Fig. 1. (Other conﬁgurations with two independent phase shifters between thebeam splitters B are ultimately equivalent for photonicmeshes [21].) If one or two single-mode phase shifters areadded at the inputs, we can apply an arbitrary SU(2) orU(2) transformation to the inputs, respectively.We deﬁne the transmissivity and reﬂectivity of the MZIast : cos2θ2 U12 2 U21 2 ,2θ2 1 t U11 U22 .r : sin(2)22In this convention, when θ π , we have r 1, t 0 (theMZI “bar state”), and when θ 0, we have r 0, t 1(the MZI “cross state”).If there are N input modes and the interferometer is connected to waveguides n and n 1, then we can embed the2 2 unitary U from Eq. (1) in N -dimensional space witha locally interacting unitary “Givens rotation” Un deﬁnedas 1 . . 0Un : 0 . .0II. PHOTONIC MESH i1···.·········nn 10.0.U11U21.U12U22.00·········.··· 0. . 0 0 . .n. (3)n 11All diagonal elements are 1 except labeled U11 and thoseU22 , which have magnitudes of r 1 t, and all oﬀdiagonal elements are 0 except those labeled U12 and U21 ,which have magnitudes of t.064044-2

MATRIX OPTIMIZATION ON UNIVERSAL.PHYS. REV. APPLIED 11, 064044 (2019)Arbitrary unitary transformations can be implementedon a photonic chip using only locally interactingMZIs [11]. In this paper, we focus on optimizing a rectangular mesh [10] of MZIs; however, our ideas can beextended to other universal schemes, such as the triangularmesh [22], as well.In the rectangular mesh scheme [10] of Fig. 1, werepresent ÛR U(N ) in terms of N (N 1)/2 locallyinteracting Givens rotations Un and N single-mode phaseshifts at the inputs represented by diagonal unitaryD(γ1 , γ2 , . . . γN ):NUn (θn , φn ) · D(γ1 , γ2 , . . . γN ),ÛR : (4) 1 n S ,Nwhere our layerwise product left-multiplies from Nto 1 (in general, for matrix products for a sequenceN{M }, we deﬁne the multiplication order 1 M MN MN 1 · · · M1 ); the single-mode phase shifts are γn [0, 2π ); and the Givens rotations are parameterizedby θn [0, π ], φn [0, 2π ). (Since γn , φn are periodicphase parameters, they are in half-open intervals [0, 2π ).In contrast, any θn [0, π ] must be in a closed interval to achieve all transmissivities tn [0, 1].) We deﬁnethe top indices of each interacting mode for each verticallayer as the set S ,N {n [1, 2, . . . N 1] n(mod 2) (mod 2)}. This vertical layer deﬁnition follows the convention of Refs. [23] and [7] and is depicted in Fig. 1,where represents the index of the vertical layer.B. Beam-splitter error tolerancesThe expressions in Eqs. (1) and (4) assume perfect fabrication. In practice, however, we would like to simulatehow practical devices with errors in each transfer matrixB, Rφ , Rθ in Eq. (1) impact optimization performance.In fabricated chip technologies, imperfect beam splittersB can have a split ratio error that changes the behavior ofthe red 50:50 coupling regions in Fig. 1 or B in Eq. (1). Theresultant scattering matrix U with imperfect beam splittersB can be written as 1 1 B : 1 i2 i 1 ,1 (5)U : Rφ B Rθ B .As shown in Appendix B, if we assume both beam splittershave identical , we ﬁnd that t : t(1 2 ) [0, 1 2 ]is the realistic transmissivity; r : r t · 2 [ 2 , 1] isthe realistic reﬂectivity; and t, r are the ideal transmissivityand reﬂectivity deﬁned in Eq. (2).The unitary matrices in Eq. (5) cannot express the fulltransmissivity range of the MZI, with errors of up to 2in the transmissivity, potentially limiting the performanceof greedy progressive photonic algorithms [24–26]. OurHaar phase theory, which we develop in the followingsection, determines acceptable interferometer tolerancesfor calibration of a “perfect mesh” consisting of imperfect beam splitters [21] given large N . We will additionallyshow that simulated photonic backpropagation [7] withadaptive learning can adjust to nearly match the performance of perfect meshes with errors as high as 0.1 formeshes of size N 128.C. Phase-shift tolerancesAnother source of uncertainty in photonic meshes is thephase-shift tolerances of the mesh that aﬀect the matricesRθ , Rφ of Eq. (1), shown in orange in Fig. 1. Error sourcessuch as thermal cross talk or environmental drift may resultin slight deviance of phase shifts in the mesh from intendedoperation. Such errors primarily aﬀect the control parameters θn that control light propagation in the mesh byaﬀecting the MZI split ratios. This nontrivial problem warrants a discussion of mean behavior and sensitivities (i.e.,the distribution) of θn needed to optimize a random unitarymatrix.III. HAAR INITIALIZATIONA. Cross-state bias and sensitivity indexThe convergence of global optimization depends critically on the sensitivity of each phase shift. The gradientdescent optimization we study in this paper convergeswhen the phase shifts are correct to within some acceptable range. This acceptable range can be rigorously deﬁnedin terms of average value and variance of phase shiftsin the mesh that together deﬁne an unbiased (“Haar random”) unitary matrix. (A Haar random unitary is deﬁnedas Gram-Schmidt orthogonalization of N standard normalcomplex vectors [16,20].) To implement a Haar randomunitary, some MZIs in the mesh need to be biased towarda cross state (tn near 1, θn near 0) [16,24]. This crossstate bias correspondingly “pinches” the acceptable rangefor transmissivity and phase shift near the limiting crossstate conﬁguration, resulting in higher sensitivity, as canbe seen in Fig. 3(b).For an implemented Haar random unitary matrix, lowtolerance, transmissive MZIs are located toward the centerof a rectangular mesh [16,24] and the apex of a triangular mesh as proven in Appendix C. For both the triangularand rectangular meshes, the cross-state bias and corresponding sensitivity for each MZI depend only on thetotal number of reachable waveguide ports, as proven inAppendix I. Based on this proof, we deﬁne the sensitivityindex αn : In On N 1 (note that 1 αn N 1, and there are always N αn MZIs that have a sensitivity index of αn ), where In and On are the subsetsof input and output waveguides reachable by light exitingor entering the MZI, respectively, and · denotes set size.064044-3

PAI, BARTLETT, SOLGAARD, and MILLER(a)(b)(c)(d)(e)(f)PHYS. REV. APPLIED 11, 064044 (2019)FIG. 2. (a) The sensitivity index αn for N 64. (b) Checkerboard plot for the average reﬂectivity rn in a rectangularmesh. (c) Decomposition of Ref. [10] for a Haar-random matrixyields phases close to cross state in the middle of the mesh. (d)The Haar phase ξn for the rectangular mesh better displays therandomness. (e),(f) Field measurements (absolute value) frompropagation at input 32 in (e) Haar and (f) uniform randominitialized rectangular meshes with N 64.(a)(b)(c)(d)FIG. 3. (a) Plot of the relationship between ξα and θ . (b) Weshow that phase-shift standard deviation σθ ;α decreases as αincreases. (c) A plot of σθ ;α as α increases. (d) The transmissivityof a MZI component as a function of a periodic Haar phase hasa power-law relationship. The periodic Haar phase ξα is mappedto the Haar phase by a function ξ : R [0, 1] as discussed inAppendix G.θn /2 π/2: ξn : θn /2π/2Pαn (θ)dθ .(7)Figures 1 and 2(a) show the sensitivity index for the rectangular mesh, which clearly increases toward the centerMZI.Using Eqs. (6) and (7), we can deﬁne ξn (θn ) [0, 1] thatyields a Haar random matrix: αn α2 θn ξn cos tn n ,(8)2B. Phase-shift distributions and Haar phasewhere tn represents the transmissivity of the MZI, whichis a function of θn as deﬁned in Eqs. (2).The external φn , γn phase shifts do not aﬀect the thetransmissivity tn and therefore obey uniform random distributions [16]. In contrast, the θn phase shifts have a probability density function (PDF) that depends on αn [16]:Pαn θn 2 αn sinθn 2 cosθn 2 2αn 1.(6)The general shape of this distribution is presented inFig. 3(b), showing how an increase in αn biases θn towardthe cross state with higher sensitivity.We deﬁne the Haar phase ξn as the cumulative distribution function (CDF) of θn /2 starting fromC. Haar initializationIn the physical setting, it is useful to ﬁnd the inverseof Eq. (8) to directly set the measurable transmissivity tn of each MZI using a uniformly varying Haar phase ξn U (0, 1), a process we call “Haar initialization,” shown inFigs. 2(c) and 2(d):α n ξn ,tn (9) 2α n θn 2 arccos tn 2 arccosξn ,where the expression for θn is just a rearrangement ofEq. (2).064044-4

MATRIX OPTIMIZATION ON UNIVERSAL.PHYS. REV. APPLIED 11, 064044 (2019)Haar initialization can be achieved progressively usinga procedure similar to that in Ref. [25]. If the phase shiftersin the mesh are all well characterized, the transmissivities can be directly set [16]. We show in Sec. V that Haarinitialization improves the convergence speed of gradientdescent optimization signiﬁcantly.We can also use Eq. (9) to ﬁnd the average transmissivity and reﬂectivity for a MZI parameterized by αn as isfound through simulation in Ref. [24]: ξn αn ,αn 1011 rn .αn 1 In On N tn 1dξn αn (10)The average reﬂectivity rn shown in Fig. 2(b) gives asimple interpretation for the sensitivity index shown inFig. 2(a). The average reﬂectivity is equal to the inverseof the total number of inputs and outputs reachable by theMZI minus the number of ports on either side of the device,N . This is true regardless of whether αn is assigned for atriangular or rectangular mesh.To see what the Haar initialization has accomplished,we can compare the ﬁeld propagation through the rectangular mesh from a single input when it is Haar initializedversus uniform initialized in Fig. 2(e). Physically, this corresponds to light in the mesh spreading out quickly fromthe input of the mesh and “interacting” more near theboundaries of the mesh (inputs, outputs, top, and bottom),as compared to the center of the mesh, which has hightransmissivity. In contrast, when phases are randomly set,the light eﬀectively follows a random walk through themesh, resulting in the ﬁeld propagation pattern shown inFig. 2(f).D. Tolerance dependence on NWhile Haar initialization is based on how the average component reﬂectivity scales with N , optimizationconvergence and device robustness ultimately depend onhow phase-shift tolerances scale with N . The averagesensitivity index in the mesh is αn (N 1)/3. Asshown in Figs. 3(b) and 3(c), the standard deviation σθ ;αover the PDF Pα decreases as α increases. Therefore, aphase shifter’s allowable tolerance, which roughly correlates with σθ;α , decreases as the total number of input andoutput ports aﬀected by that component increases. Since αn increases linearly with N , the required tolerance getsmore restrictive at large N , as shown in Fig. 3(c). We ﬁndthat the standard deviation is on the order 10 2 radiansfor most values of N in the speciﬁed range. Thus, if thermal cross talk is ignored [6], it is possible to implement aknown random unitary matrix in a photonic mesh assuming perfect operation. However, we concern ourselves withon-chip optimization given just input and output data, inwhich case the unitary matrix is unknown. In such a case,the decreasing tolerances do pose a challenge in converging to a global optimum as N increases. We demonstratethis problem for N 128 in Sec. V.To account for the scalability problem in global optimization, one strategy may be to design a component insuch a way that the mesh MZIs can be controlled by Haarphase voltages as in Fig. 3(d) and Eq. (9). The transmissivity dependence on a periodic Haar phase [shown inFig. 3(d) and discussed in Appendix G] is markedly different from the usual sinusoidal dependence on periodicθn . The MZIs near the boundary vary in transmissivityover a larger voltage region than the MZIs near the center,where only small voltages are needed get to full transmissivity. This results in an eﬀectively small control tolerancenear small voltages. This motivates the modiﬁcations tothe mesh architecture which we discuss in the next section.IV. ARCHITECTURE MODIFICATIONSWe propose two architecture modiﬁcations that canrelax the transmissivity tolerances in the mesh discussedin Sec. III and result in signiﬁcant improvement in optimization.A. Redundant rectangular meshBy adding extra tunable MZIs, it is possible to greatlyaccelerate the optimization of a rectangular mesh to anunknown unitary matrix. The addition of redundant tunable layers to a redundant rectangular mesh (RRM) isdepicted in green in Fig. 4(a). The authors in Ref. [24]point out that using such “underdetermined meshes” (number of inputs less than the number of tunable layers in themesh) can overcome photonic errors and restore ﬁdelityin unitary construction algorithms. Adding layers to themesh increases the overall optical depth of the device, butembedding smaller meshes with extra beam-splitter layersin a rectangular mesh of an acceptable optical depth doesnot pose intrinsic waveguide loss-related problems.B. Permuting rectangular meshAnother method to accelerate the optimization of a rectangular mesh is to shuﬄe outputs at regular intervalswithin the rectangular mesh. This shuﬄing relaxes component tolerances and uniformity of the number of pathsfor each input-output transition. We use this intuition toformally deﬁne a permuting rectangular mesh (PRM). Forsimplicity, assume N 2K for some positive integer K.Deﬁne “rectangular permutation” operations Pk that allowinputs to interact with waveguides at most 2k away for k K. These rectangular permutation blocks can be implemented using a rectangular mesh composed of MZIs withﬁxed cross-state phase shifts, as shown in Fig. 4(b), orusing low-loss waveguide crossings.064044-5

PAI, BARTLETT, SOLGAARD, and MILLERPHYS. REV. APPLIED 11, 064044 (2019)(a)(b)FIG. 4. (a) A 16 16 rectangular mesh (red). Extra tunablelayers (green) may be added to signiﬁcantly reduce convergence time. (b) A 16-input, 30-layer permuting rectangular mesh.The rectangular permutation layer is implemented using eitherwaveguide crossings or cross-state MZIs (gray).We now add permutation matrices P1 , P2 , . . . PK 1 intothe middle of the rectangular mesh as follows: K 1 ÛPR : MKPk Mk ,each component for the Haar random unitary matrices discussed in Sec. III. Uniform random phase initialization isproblematic because it is agnostic of the sensitivity andaverage behavior of each component. We deﬁne this distribution of matrices as UR (N , L) for a rectangular mesh forN inputs and L layers. As shown previously in Fig. 2(f),any given input follows a random-walklike propagationif phases are initialized uniform randomly, so there willonly be nonzero matrix elements within a “bandsize” aboutthe diagonal. This bandsize decreases as circuit size Nincreases as shown in Fig. 5.We compare the bandsizes of banded unitary matrices in simulations qualitatively as we do in Fig. 5 orquantitatively as we do in Appendix D. We randomly generate U UR (N , N ), U UPR (N ) (permuting rectangularmesh with N tunable layers), and U UR (N , N δN )(redundant rectangular mesh with δN extra tunable layers). Figure 5 shows a signiﬁcant reduction in bandsize asN grows larger for rectangular meshes. This phenomenonis not observed with permuting rectangular meshes, whichgenerally have the same bandsize as Haar random matrices(independent of N ) as shown in Fig. 5 and Appendix D.This correlates with permuting rectangular meshes havingfaster optimization and less dependence on initialization.Instead of initializing the mesh using uniform randomphases, we use Haar initialization as in Eq. (9) to avoidstarting with a banded unitary conﬁguration. This initialization, which we recommend for any photonic meshbased neural network application, dramatically improvesconvergence because it primes the optimization with theright average behavior for each component. We ﬁnd in oursimulations that as long as the initialization is calibratedk 1 min k NK ,N(11)Un (θn , φn ),Mk : (k 1) NKn S ,Nwhere x represents the nearest integer larger than x.There are two operations per block k: an N /K -layerrectangular mesh, which we abbreviate as Mk , and therectangular permutation mesh Pk , where block index k [1 · · · K 1]. This is labeled in Fig. 4(b).V. SIMULATIONSNow that we have discussed the mesh modiﬁcationsand Haar initialization, we simulate global optimization toshow how our framework can improve convergence performance by up to ﬁve orders of magnitude, even in thepresence of fabrication error.A. Mesh initializationWe begin by discussing the importance of initializingthe mesh to respect the cross-state bias and sensitivity ofFIG. 5. Elementwise absolute values of unitary matrices resulting from rectangular (U UR ) and permuting rectangular (U UPR ) meshes, where meshes are initialized with uniform-randomphases.064044-6

MATRIX OPTIMIZATION ON UNIVERSAL.PHYS. REV. APPLIED 11, 064044 (2019)toward higher transmissivity (θn near 0), larger mesh networks can also have reasonable convergence times similarto when the phases are Haar initialized.The proper initialization of permuting rectangularmeshes is less clear because the tolerances and averagebehavior of each component have not yet been modeled.Our proposal is to initialize each tunable block Mk as anindependent mesh using the same deﬁnition for αn , exceptreplacing N with the number of layers in Mk , N /K . Thisis what we use as the Haar initialization equivalent in thepermuting rectangular mesh case, although it is possiblethere may be better initialization strategies for the nonlocalmesh structure.B. Optimization problem and synthetic dataAfter initializing the photonic mesh, we proceed to optimize the mean-square error cost function for an unknownHaar random unitary U:minimizeθn ,φn ,γn 21 Û(θn , φn , γn ) U ,F2Nthat the Adam update rule (a popular ﬁrst-order adaptive update rule [27]) outperforms the standard stochasticgradient descent for the training of unitary networks. If gradient measurements for the phase shifts are stored duringtraining, adaptive update rules can be applied using successive gradient measurements for each tunable componentin the mesh. Such a procedure requires minimal computation (i.e., locally storing the previous gradient step) andcan act as a physical test of the simulations we now discuss. Furthermore, we avoid quasi-Newton optimizationmethods such as L-BFGS used in Ref. [24] that cannot beimplemented physically as straightforwardly as ﬁrst-ordermethods.The models are trained using our open source simulation framework neurophox (see Ref. [28]) using a moregeneral version of the vertical layer deﬁnition proposed inRefs. [23] and [7]. The models are programmed in tensorflow [29] and run on an NVIDIA GeForce GTX1080 GPUto improve optimization performance.(12)D. Resultswhere the estimated unitary matrix function Û maps N 2phase-shift parameters θn , φn , γn to U(N ) via Eq. (4)or (11) and · F denotes the Frobenius norm. Sincetrigonometric functions parameterizing Û are nonconvex,we know that Eq. (12) is a nonconvex problem. Thenonconvexity of Eq. (12) suggests learning a single unitary transformation in a deep neural network might havesigniﬁcant dependence on initialization.To train the network, we generate random unit-normcomplex input vectors of size N and generate corresponding labels by multiplying them by the target matrix U. Weuse a training batch size of 2N . The synthetic training dataof unit-norm complex vectors is therefore represented byX CN 2N . The minibatch training cost function is similar to the test cost function, Ltrain ÛX UX 2F . Thetest set is the identity matrix I of size N N . The testcost function, in accordance with the training cost functiondeﬁnition, thus matches Eq. (12).C. Training algorithmWe simulate the global optimization of a unitary meshusing automatic diﬀerentiation in tensorﬂow, which canbe physically realized using the in situ backpropagationprocedure in Ref. [7]. This optical backpropagation procedure physically measures Ltrain / θn using interferometric techniques, which can be extended to any of thearchitectures that we discuss in this paper.The on-chip backpropagation approach is also likelyfaster for gradient computation than other trainingapproaches such as the ﬁnite-diﬀerence method mentionedin past on-chip training proposals [6]. We ﬁnd empiricallyWe now compare training results for rectangular, redundant rectangular, and permuting rectangular meshes givenN 128. In our comparison of permuting rectangularmeshes and rectangular meshes, we analyze performancewhen beam-splitter errors are distributed throughout themesh as either 0 or N (0, 0.01) and when the θn are randomly or Haar initialized [according to the PDF inEq. (6)]. We also analyze optimization performances ofredundant rectangular meshes where we vary the numberof vertical MZI layers.From our results, we report ﬁve key ﬁndings:1. Optimization of N 128 rectangular meshesresults in signiﬁcant oﬀ-diagonal errors due to bias towardthe banded matrix space of UR (128), as shown in Fig. 6.2. Rectangular meshes converge faster when Haar initialized than when uniformly random initialized, as inFig. 6, in which case the estimated matrix convergestoward a banded conﬁguration, as shown in Appendix H.3. Permuting rectangular meshes converge faster thanrectangular meshes despite having the same number oftotal parameters, as shown in Fig. 6.4. Redundant rectangular meshes, because of anincrease in the number of parameters, have up to 5 ordersof magnitude better convergence when the number ofvertical layers is doubled compared to rectangular andpermuting rectangular meshes, as shown in Fig. 7.5. Beam-splitter imperfections slightly reduce theoverall optimization performance of permuting and redundant rectangular meshes, but reduce the performance ofthe rectangular mesh signiﬁcantly. (See Fig. 6(a) andAppendix E.)064044-7

PAI, BARTLETT, SOLGAARD, and MILLERPHYS. REV. APPLIED 11, 064044 (2019)--(a)(b)(c)(d)(e)FIG. 7. A comparison of test error in tensorflow for N 128 between rectangular (RM), permuting rectangular (PRM),and redundant rectangular (RRM) meshes for 20,000 iterations, Adam update, learning rate of 0.0025, batch size of 256.Ideal denotes Haar-initialized θn with 0. δN is the additional layers added in the redundant mesh. We stop the δN 128 run within 4000 iterations when it reaches convergencewithin machine precision. Redundant meshes with 32 additional layers converge better than permuting rectangular meshesand, with just 16 additional layers, we get almost identicalperformance.of SVD architectures using automatic diﬀerentiation inAppendix F.FIG. 6. We implement six diﬀerent optimizations for N 128, where we vary the choice of permuting rectangular mesh(PRM) or rectangular mesh (RM); the initialization (random θn or Haar-initialized θn ); and photonic transmissivity error displacements [ 0 or N (0, 0.01), where σ 2 0.01 is thev

PHYSICAL REVIEW APPLIED 11, 064044 (2019) Matrix Optimization on Universal Unitary Photonic Devices Sunil Pai,1,* Ben Bartlett,2 Olav Solgaard, 1and David A. B. Miller 1Department of Electrical Engineering, Stanford University, Stanford, California 94305, USA 2Department of Applied Physics, Stanford University, Stanford, California 9

Related Documents:

DuPont Tyvek Fluid Applied WB - Commercial Installation ...

Tyvek Fluid Applied products should be applied when air and surface temperatures are between 25 F – 100 F. 5. Skin time of fluid applied product is 1-2 hrs. at 70 F and 50% RH. Wait 24 hrs. between coats of Fluid applied product and before applying facade. 6. Unopened fluid applied product should be stored at temperatures between 50 FFile Size: 2MBPage Count: 12Explore furtherTyvek Fluid Applied WB - Home DuPontwww.dupont.comTyvek Fluid Applied WB - Home DuPontwww.dupont.comDuPont Weather Barrier Commercial Installation Guidelinessweets.construction.comDuPont Tyvek Water-Resistive and Air Barriers Residing .www.dupont.comDuPont Tyvek StuccoWrap Data Sheet - Construction .constructioninstruction.comRecommended to you b

62 Views

2y ago

EOC Review Unit EOC Review Unit - Weebly

1 EOC Review Unit EOC Review Unit Table of Contents LEFT RIGHT Table of Contents 1 REVIEW Intro 2 REVIEW Intro 3 REVIEW Success Starters 4 REVIEW Success Starters 5 REVIEW Success Starters 6 REVIEW Outline 7 REVIEW Outline 8 REVIEW Outline 9 Step 3: Vocab 10 Step 4: Branch Breakdown 11 Step 6 Choice 12 Step 5: Checks and Balances 13 Step 8: Vocab 14 Step 7: Constitution 15

74 Views

1y ago

PHYSICAL EDUCATION CURRICULUM TABLE OF CONTENTS

physical education curriculum table of contents acknowledgements 2 district mission statement 3 physical education department mission statement 3 physical education task force 3 physical education and academic performance 4 naspe learning standards 8 new york state physical education learning standards 8 physical education high school curriculum guide 15 physical education curriculum analysis .

199 Views

3y ago

Mergers & Acquisitions Review

the public–private partnership law review the real estate law review the real estate m&a and private equity review the renewable energy law review the restructuring review the securities litigation review the shareholder rights and activism review the shipping law review the sports law review the tax disputes and litigation review

154 Views

3y ago

Nevada Physical Therapy Board Newsletter

Nevada Physical Therapy Board Mission Statement Mission The mission of the Nevada Physical Therapy Board is to protect the safety and well-being of the public consumer of physical therapy. Who We License The Nevada Physical Therapy Board licenses physical therapists and physical thera-pist assistants.

13 Views

1y ago

Chapter Four, Physical Activity and Fitness LESSON PLAN PART I Lesson 1 ...

1. Know the importance of physical fitness. 2. Know the measures of physical fitness. 3. Know how to plan and execute a physical fitness plan. Samples of Behavior/Main Points: 1. Define physical fitness and explain the difference between physical activity and exercise. 2. Identify the benefits of physical activity. 3.

11 Views

9m ago

Towards an Analysis of Review Article in Applied ...

Keywords: review genres, academic review genres, review article, critical evaluative review, mixed-mode review, bibliographic review 1. Introduction Review genres are normally written texts or part texts that can provide suitable places for expression of personal ideas, attitudes, and evaluations.

72 Views

3y ago

SKRIPSI PERLAKUAN AKUNTANSI TERHADAP PENDAPATAN PADA PT ...

Sarjana Akuntansi Syariah (S.Akun) Pada Program Studi Akuntansi Syariah Menyetujui Pembimbing I Pembimbing II Drs. Sugianto, MA Kamilah, SE, AK, M.Si NIP. 196706072000031003 NIP. 197910232008012014 Mengetahui Ketua Jurusan Akuntansi Syariah Hendra Harmain, SE., M. Pd NIP. 197305101998031003 . LEMBARAN PERSETUJUAN PENGUJI SEMINAR Proposal skripsi berjudul “PERLAKUAN AKUNTANSI TERHADAP .

97 Views

3y ago

Recent Views

Saint Robert Bellarmine - WordPress

Aug 08, 2018 · Sister Laura Gorman Sister Anna Frances Portisch Sister Mary Edward Haren Sister Dolores Priske (Helen Julie) Sister Scholastica Healy Sister olette Marie Quinn Sister lara . S. Heidelman Sister Alice Mary Reilly Sister Genevieve Henneberry (Fidelis) Sister Genevieve Rigney

2y ago

160 Views

Sunday, September 12, 2021 10:00 a.m.

Sep 12, 2021 · On our 154th Church Anniversary, We salute the members of Mount Pleasant Baptist Church who have served for 50 years or more. Sister Brenda Bradley Sister Mary Lockett Sister Aaronita Brown Sister June Marshall Deacon Carlton Brown Sister Barbara Moore Sister Gwendolyn Brown Sister Frances Robinson Deaconess Josephine Byrd Sister Frances Ross

2y ago

344 Views

MRS Title 21-A. ELECTIONS - Maine Legislature

stepgrandchild, stepsister, stepbrother, mother-in-law, father-in-law, brother-in-law, sister-in-law, son-in-law, daughter-in-law, guardian, former guardian, domestic partner, the half-brother or half-sister of a person's spouse, or the spouse of a person's half-brother or half-sister. [PL 2009, c. 253, §2 (AMD).] 21. Incoming voting list.

1y ago

118 Views

12 PUBLIC LAW AND PRIVATE LAW - Home: The National .

INTRODUCTION TO LAW MODULE - 3 Public Law and Private Law Classification of Law 164 Notes z define Criminal Law; z list the differences between Public and Private Law; and z discuss the role of Judges in shaping Law 12.1 MEANING AND NATURE OF PUBLIC LAW Public Law is that part of law, which governs relationship between the State

3y ago

745 Views

Dr. Ram Manohar Lohiya National Law University, Lucknow

2. Health and Medicine Law 3. Int. Commercial Arbitration 4. Law and Agriculture IXth SEMESTER 1. Consumer Protection Law 2. Law, Science and Technology 3. Women and Law 4. Land Law (UP) Xth SEMESTER 1. Real Estate Law 2. Law and Economics 3. Sports Law 4. Law and Education **Seminar Courses Xth SEMESTER (i) Law and Morality (ii) Legislative .

3y ago

496 Views

Companies Law - Cayman Islands dollar

Law 1 of 1971-15th December, 1970 Law 7 of 2000- 20th July, 2000 Law 7 of 1973-28th June, 1973 Law 5 of 2001-20th April, 2001 Law 24 of 1974-22nd November, 1974 Law 10 of 2001-25th May, 2001 Law 25 of 1975-9th December, 1975 Law 29 of 2001-26th September, 2001 Law 19 of 1977-10th November, 1977 Law 46 of 2001-14th January, 2002

3y ago

454 Views

It’s the Law!

ciples stated in Boyle’s Law, Charles’ Law, Gay-Lussac’s Law, Henry’s Law, and Dalton’s Law. Students will be able to explain the application of Boyle’s Law, Charles’ Law, Gay-Lussac’s Law, Henry’s Law, and Dalton’s Law to observations or events related to SCUBA diving. MateriaLs None audio/visuaL MateriaLs None teachinG tiMe

2y ago

378 Views

WHAT LAW IS ? An Introduction to Law

common law system civil law system!! sources of law in civil law !! a1. primary: statutes (written law) enacted by legislative power are the principal source of law. ! a2. two subsidiary sources of law: ! a2.1 administrative regulations a.2.2 customs!! ! sources of law in common law !!! b1. two primary sources of

2y ago

385 Views

Immaculata, Pennsylvania 19345-0200 Catholic Schools

Fall, 2012 Cover Sister Monica Therese Sicilia, I.H.M. IHM Best Practices Sister Margaret Rose Adams, I.H.M For Teachers: Sister Adrienne Saybolt, I.H.M. “Helping K-2 Students Struggling with Reading and Writing” Prime Times Sister Rita James Murphy, I.H.M. Good Writer

2y ago

117 Views

Winter 2012 - IHM EDUCATIONAL RESOURCES - Home

IHM Best Practices Sister Margaret Rose Adams, IHM For Teachers: Sister Adrienne Saybolt, IHM “Helping K-2 Students Struggling with Reading and Writing” Prime Times Sister Elaine deChantal Brookes, IHM Sister

2y ago

138 Views

Tributes in Honor of: SISTER JANET AHLER, CSA CSA SISTERS .

Everett & Jeannine Solon SISTER CORINNE HEIMANN, CSA St Mary's Hospital Board of Directors Teresa Hebble John & Mary Sterba SISTER MARY VERONICA HEIMANN, CSA Sybil Teehan Teresa Hebble Rebecca & Gary Tirevold MR EDWARD HELSTOSKY Bonnie Young Barbara Britz SISTER JOELLEN FLYNN, CSA RAY HINZ Susan Flynn Carol Hinz Fran Frigo JEAN W HOFF

2y ago

341 Views

How to Use These “Snippets” and Poems

For Sale By Shel Silverstein One sister for sale! One sister for sale! One crying and spying young sister for sale! I’m really not kidding, So who’ll start the bidding? Do I hear the dollar? A nickel? A penny? Oh, isn’t there, isn’t there, isn’t there any One kid that will buy this old sister for sale,

2y ago

367 Views

CODIS2006 - Mixture Interpretation - Butler FINAL

“Things we do not do: Calculate mixture ratios for casework – Calculation used for this study: Find loci with 4 alleles (2 sets of sister alleles). Make sure sister alleles fall within 70%, then take the ratio of one allele from one sister set to one allele of the second sister set, figure ratios for all combinations and average.

2y ago

315 Views

CONSECRATA

Salesian Sisters of St. John Bosco Sister Marie Amata D’Amico, C.K. School Sisters of Christ the King Sister Mary Stephany Rose, O.S.H.J. Oblate Sisters of the Sacred Heart of Jesus Sister Brigid Mary Meeks, R.S.M. Religious Sisters of Mercy of Alma Sister Hae-Jin Lim, F.M.A. Salesian Sisters of

1y ago

111 Views

Sister Makes House Calls During the Pandemic

Sister Patricia Deckert, RSM. As an elementary school teacher, Sister Patricia (Pat) taught in the Trenton, Metuchen and Camden dioceses in New Jersey, serving eight years at Cathedral School in Trenton, and seven years at St. James School in Red Bank. Attending nursing school at the age of 50, Sister Pat first ministered at McAuley Hall

10m ago

86 Views

PHYSICAL REVIEW APPLIED 11, 064044 (2019) - Stanford EE

It looks like you're using an ad-blocker