A. Appendix

2y ago
22 Views
2 Downloads
904.95 KB
7 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Mya Leung
Transcription

Measuring abstract reasoning in deep neural networksA. Appendixoctagon, starA.1. PGM DatasetAltogether there are 1.2M training set questions, 20K validation set questions, and 200K testing set questions.When creating the matrices we aimed to use the full Cartesian product R A for construction structures S. However,some relation-attribute combinations are problematic, suchas a progression on line type, and some attributes interact ininteresting ways (such as number and position, which arein some sense tied), restricting the type of relations we canapply to these attributes. The final list of relevant relationsper attribute type, broken down by object type (shape vs.line) is:shape:size: progression, XOR, OR, AND, consistent unioncolor: progression, XOR, OR, AND, consistent unionnumber: progression, consistent unionposition: XOR, OR, ANDtype: progression, XOR, OR, AND, consistent unionline:color: progression, XOR, OR, AND, consistent uniontype: XOR, OR, AND, consistent unionSince the number and position attribute types are tied (forexample, having an arithmetic progression on number whilsthaving an XOR relation on position is not possible), we forbid number and position from co-occurring in the same matrix. Otherwise, all other ((r, o, a), (r, o, a)) combinationsoccurred unless specifically controlled for in the generalisation regime.We created a similar list for possible values for a givenattribute:shape:color: 10 evenly spaced greyscale intensities in [0, 1]size: 10 scaling factors evenly spaced in [0, 1] 4number: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9position ((x, y) coordinates in a (0, 1) plot):(0.25, 0.75),(0.75, 0.75),(0.75, 0.25),(0.25, 0.25),(0.5, 0.5),(0.5, 0.25),(0.5, 0.75),(0.25, 0.5),(0.75, 0.5)type: circle, triangle, square, pentagon, hexagon,4The actual specific values used for size are numbers particularto the matplotlib implementation of the plots, and hence dependon the scale of the plot and axes, etc.line:color: 10 evenly spaced greyscale intensity in [0, 1]type: diagonal down, diagonal up, vertical, horizontal,diamond, circleA.2. Examples of Raven-style PGMsGiven the radically different way in which visual reasoningtests are applied to humans (no prior experience) and to ourmodels (controlled training and test splits), we believe itwould be misleading to provide a human baseline for ourresults. However, for a sense of the difficulty of the task,we present here a set of 18 questions generated from theneutral splits. Note that the values are filtered for humanreadability. In the dataset there are 10 greyscale intensityvalues for shape and line colour and 10 sizes for each shape.In the following, we restrict to 4 clearly-distinct values foreach of these attributes. Best viewed on a digital monitor,zoomed in (see next page). Informal human testing revealedwide variability: participants with a lot of experience withthe tests could score well ( 80%), while others who cameto the test blind would often fail to answer all the questions.

Measuring abstract reasoning in deep neural FGHEFGH

Measuring abstract reasoning in deep neural ABCDEFGHEFGHEFGH

Measuring abstract reasoning in deep neural networksB. Model detailsHere we provide additional details for all our models, including the exact hyper-parameter settings that we considered.Throughout this section, we will use the notation [x, y, z, w]to describe CNN and MLP size. For a CNN, this notationrefers to the number of kernels per layer: x kernels in thefirst layer, y kernels in the second layer, z kernels in thethird layer and w kernels in the fourth layer. For the MLP, itrefers to the number of units per layer: x units in the firstlayer, y units in the second layer, z units in the third layerand w units in the fourth layer.All models were trained using the Adam optimiser, withexpoential decay rate parameters β1 0.9, β2 0.999, 10 8 . We also used a distributed training setup, using 4GPU-workers per model.CNN kernelsCNN kernel sizeCNN kernel strideMLP hidden-layer sizeMLP drop-out fractionBatch SizeLearning ratehyper-parameters[64, 64, 64, 64]3 3215000.5160.0003Table 2. CNN-MLP hyper-parametersBatch SizeLearning ratehyper-parameters320.0003Table 3. ResNet-50 and context-blind ResNet hyper-parametersCNN kernelsCNN kernel sizeCNN kernel strideLSTM hidden layer sizeDrop-out fractionBatch SizeLearning ratehyper-parameters[8, 8, 8, 8]3 32960.5160.0001Table 4. LSTM hyper-parametersCNN kernelsCNN kernel sizeCNN kernel strideRN embedding sizeRN gθ MLPRN fφ MLPDrop-out fractionBatch SizeLearning ratehyper-parameters[32, 32, 32, 32]3 32256[512, 512, 512, 512][256, 256, 13]0.5320.0001Table 5. WReN hyper-parametersBatch SizeLearning ratehyper-parameters160.0003Table 6. Wild-ResNet hyper-parameters

Measuring abstract reasoning in deep neural networksC. Results# RelationsOneTwoThreeFourAllWReN (%)68.551.144.548.462.6Blind (%)23.621.222.123.522.8Table 7. WReN test performance and Context-Blind ResNet performance after training on the neutral PGM dataset, broken downaccording to the number of relations per matrix.ORANDconsistent shapeAll Single RelationsWReN 268.5Blind 623.6Table 8. WReN test performance and Context-Blind ResNet performance for single-relation PGM questions after training on theneutral PGM dataset, broken down according to the relation type,attribute type and object type in a given matrix.Figure 6. Relationship between answer accuracy and shapemeta-target prediction certainty. The WReN model (β 10)is more accurate when confident about its meta-target predictions.Certainty was defined as the mean absolute difference of the predictions from 0.5.Figure 7. Relationship between answer accuracy and attributemeta-target prediction certaintyFigure 8. Relationship between answer accuracy and relationmeta-target prediction certainty

Measuring abstract reasoning in deep neural networksTest (%)RegimeNeutralInterpolationH.O. Attribute PairsH.O. Triple PairsH.O. TriplesH.O. line-typeH.O. shape-colourExtrapolationβ 022.418.412.715.011.614.412.514.1β 1013.512.212.312.612.412.612.313.0Table 9. Performance of the Context-blind Resnet model for allthe generalization regimes, in the case where there is an additionalauxiliary meta-target (β 10) and in the case where there is noauxiliary meta-target (β 0). Note that most of these values areeither close to chance or slightly above chance, indicating thatthis baseline model struggles to learn solutions that generalisebetter than a random guessing solution. For several generalisationregimes such as Interplolation, H.O Attribute Pairs, H.O. Triplesand H.O Triple Pairs the generalisation performance of the WReNmodel reported in Table 1 is far greater than the generalisation performance of our context-blind baseline, indicating that the WReNgeneralisation cannot be accounted for with a context-blind solution.

Measuring abstract reasoning in deep neural networksFigure 9. Answer key to puzzles in section A.2

MLP hidden-layer size 1500 MLP drop-out fraction 0.5 Batch Size 16 Learning rate 0.0003 Table 2. CNN-MLP hyper-parameters hyper-parameters Batch Size 32 Learning rate 0.0003 Table 3. ResNet-50 and context-blind ResNet hyper-parameters hyper-parameters CNN kernels [8, 8, 8, 8] CNN kernel s

Related Documents:

Issue of orders 69 : Publication of misleading information 69 : Attending Committees, etc. 69 : Responsibility 69-71 : APPENDICES : Appendix I : 72-74 Appendix II : 75 Appendix III : 76 Appendix IV-A : 77-78 Appendix IV-B : 79 Appendix VI : 79-80 Appendix VII : 80 Appendix VIII-A : 80-81 Appendix VIII-B : 81-82 Appendix IX : 82-83 Appendix X .

Appendix G Children's Response Log 45 Appendix H Teacher's Journal 46 Appendix I Thought Tree 47 Appendix J Venn Diagram 48 Appendix K Mind Map 49. Appendix L WEB. 50. Appendix M Time Line. 51. Appendix N KWL. 52. Appendix 0 Life Cycle. 53. Appendix P Parent Social Studies Survey (Form B) 54

Appendix H Forklift Operator Daily Checklist Appendix I Office Safety Inspection Appendix J Refusal of Workers Compensation Appendix K Warehouse/Yard Inspection Checklist Appendix L Incident Investigation Report Appendix M Incident Investigation Tips Appendix N Employee Disciplinary Warning Notice Appendix O Hazardous Substance List

The Need for Adult High School Programs 1 G.E.D.: The High School Equivalency Alternative 9 An Emerging Alternative: The Adult High School Ciploma 12 Conclusion 23 Appendix A -- Virginia 25 Appendix B -- North Carolina 35 Appendix C -- Texas 42 Appendix 0 -- Kansas 45 Appendix E -- Wyoming 48 Appendix F -- Idaho 56 Appendix G -- New Hampshire .

Appendix 4 . Clarification of MRSA-Specific Antibiotic Therapy . 43 Appendix 5 . MRSA SSI . 44 Appendix 6 . VRE SSI . 62 Appendix 7 . SABSI related to SSI . 74 Appendix 8 . CLABSI – Definition of a Bloodstream Infection . 86 Appendix 9 . CLABSI – Definition of a MBI -related BSI . 89 Appendix 10 . Examples relating to definition of .

Appendix E: DD Form 577 for Appointing a Certifying Officer 57 Appendix F: Sample GPC Appointment Letters 58 Appendix G: Formal Reporting Requirements 66 Appendix H: Semi-Annual Surveillance Report Template 70 Appendix I: GPC Thresholds 73 Appendix J: Glossary – Sections I and II 75 Chapter 1: The Government Purchase Card Program 1-1. Purpose a.

Appendix D: Active Voice vs Passive Voice . Appendix E: Examples, Use of D0000 . Appendix F: Additional Examples for Principles 2-6 . Appendix G: Examples, Use of D8100 . Appendix H: Examples, Lack of Documentation . Appendix I: Examples, DPS Does Not Match Findings . Appendix J: Examples, Repeating Regulations in the DPS

Appendix D, Prescribed Form for Bidder's Profile 35 12. Appendix E, Letter of Authorized Person in Charge 36 13. Appendix F, Undertaking 37 14. Appendix G, Form of Technical Proposal 38 15. Appendix H, Form of Financial Proposal 39 16. Appendix I, Form of Performance Security 40 17. Appendix J, Bank Guarantee for Advance Payment 41