Image Compression Using Simulated Annealing

2y ago
12 Views
3 Downloads
2.32 MB
21 Pages
Last View : 1m ago
Last Download : 2m ago
Upload by : Ronan Orellana
Transcription

Image Compression Using Simulated AnnealingAritra Dutta , Geonwoo Kim†, Meiqin Li‡,Carlos Ortiz Marrero§, Mohit Sinha¶, Cole StieglerkAugust 5–14, 2015Mathematical Modeling in Industry XIXInstitute of Mathematics and its ApplicationsProposed by: 1QB Information TechnologiesMentors: Michael P. Lamoureux1 , Pooya Ronagh2AbstractOver the course of this two week workshop, we developed an algorithm for image compression using an optimized implementation ofsimulated annealing intended to solve Ising spin problems. Our motivation is to be able to execute this algorithm using the 1QBit interfacefor the D-Wave quantum computational hardware system. We also explore a combination of simulated annealing and regression techniquesand compared their performance. Finally, we discuss ways to optimizeour algorithm in order to make it feasible for a D-Wave architecture. University of Central FloridaPusan National University‡Texas A & M University§University of Houston¶University of Minnesota, Twin CitieskUniversity of Iowa1Pacific Institute for the Mathematical Sciences21QB Information Technology, Vancouver B.C.†

1IntroductionSimulated annealing (SA) is a commonly used algorithm for heuristic optimization. Inspired by the study of thermal processes, this algorithm hasbeen particularly successful at providing approximate solution to NP-Hardproblems [4]. The algorithm essentially performs a Monte Carlo simulationalong with a transition function. At the beginning of the simulation thealgorithm is encourage to explore the landscape of the objective functionwhile slowly lowering the probability of abrupt transitions. If the processis done slowly enough then after a few repetitions one can hope to find theoptimal value.Our goal is to apply this algorithm to image compression and imagereconstruction. We use an implementation of SA optimized to solve Isingspin type problems [5]. The algorithm’s original purpose was to provide ahighly optimized implementation of SA and compare its performance againsta D-Wave device. It turns out that finding the lowest energy state in an Isingmodel is equivalent to solving a quadratic unconstrained binary optimizationproblem (QUBO).This investigation is motivated by the assumption that our algorithmcould be implemented in the processor produced by D-Wave Systems, Inc.An advantage to the D-Wave system is that it can produce a spectrum ofoptimal and suboptimal answers to a QUBO/Ising optimization problem,rather than merely the lowest energy point. This work grew out of thequestion of trying to determine if the sparse recovery problem is a feasibleproblem for the D-Wave architecture.2The Ising ModelThe Ising spin model is a widely used model that can be used to describeany system that have a set of individual elements interacting via pairwiseinteractions [1].Definition 2.1 (Ising Problem). Let G (V, E) be a graph on n verticeswith vertex set V , edge set E, and let si 2 { 1, 1} for i 2 V . For a givenconfiguration s (s1 , s2 , . . . , sn ) 2 { 1, 1}n , the energy of this system isgiven by,H(s) Xk2Vhk s k X(i,j)2EJij si sj hh, si hs, Jsiwhere h (h1 , . . . , hn ) 2 Rn and J (Jij ) 2 Mn (R).1

The simulated annealing implementation [5] we will be using is designedto minimize the function H(s).3Image Compression/Reconstruction ProblemThe underlying optimization problem studied here is to find a sparse solutionto the underdetermined linear system Ax b, where A is an m n matrixand b is an m-vector, and m n. The problem can be written asminimizex2{0,1}n(1) x 0subject to Axb 22 Note that for a binary vector x 2 {0, 1}n , the two norms x 0 and x 1 areidentical.This problem can be interpreted as an image compression problem whereyour goal is to find a sparse binary vector x such that its image under A isclose to the original image b. Alternatively, you can think of this problem asa image reconstruction problem where your goal is to find a sparse binaryvector x that was corrupted by A, but we only have access to the corruptedimage b. Here we will only discuss the image compression problem.4QUBO and the Ising ModelIt turns out that problem (1) can be relaxed to the following unconstrainedoptimization problem,min(2)x2{0,1}n x 0 Axb 22for large constraint parameter 0. Let e : (1, . . . , 1) 2 {0, 1}n andconsider the following reformulation of (2) x 0 Axb 22 he, xi hAxb, Ax he, xi hAx, Axi he, xi hA Ax, xi he bi2 hAx, bi hb, bi 2 hAx, bi hb, bi2 A b, xi h A Ax, xi hb, biNotice that (2) is a QUBO problem.2

Definition 4.1 (QUBO). Let g(x) hx, Qxi hc, xi, where Q is a symmetric matrix, c 2 Rn , and x (x1 , . . . , xn ) 2 {0, 1}n . The quadraticunconstrained binary optimization problem is to minimize the function g(x)over {0, 1}n .In order to use the solver we need to formulate our problem as an Isingproblem. Consider the following change of variables,1x (s e) ) x 2 {0, 1}n2Now equation (2) becomes,mins2{ 1,1}n51 t1e (s e) A(s e)22b 22ResultsIn this section we discuss our implementation of the problem discussed insection 3, illustrating how our algorithm e ectively reduces the image sizewhile maintaining a reasonable level of quality. Recall our problem ismin kxk1 kAxx2{0,1}nbk22where b is the original image and A is our blurring operator, in the form ofa convolution expressed explicitly as a sparse matrix. In order to define aconvolution, we must choose an appropriate kernel first. We found Gaussianand averaging disk kernels performed significantly better than other typesof kernels, so we limit our discussion to these kernels.We use the normalized mean square error (NMSE) as our performancemeasure,kAx bk2NMSE kbk2where here b is an array of grayscale values representing a target image andAx is the blurring convolution applied to the binary compression x.The following computations were performed using a Macbook Pro onOSX 10.10 with a 2.3 GHz Intel Core i7 and 16 GB of memory. For ourimages, we used Matlab’s built-in clown and mandrill images. Each tookapproximately 40 seconds to perform the entire computation. Most of thecomputation time was spent in the C solver written by Troyer et al. [5]. Sincethe solver is technically taking the place of the quantum annealer, the long3

Figure 1: True image, binary representation, and smoothed reconstruction.computation time is not concerning as the true quantum hardware wouldexecute this much faster. The original clown image, the binary image, andthe smoothed binary image are shown in Figure 1, with parameter 100.Initially, we started with a rather large penalty to encourage kAx bkto be as small as possible, but at the expense of a less sparse results. Onaverage, the binary images had 30-35% nonzero entries in the solution beforebeing blurred. With 100 we obtained the results shown in Table 1for the di erent kernels. Note that smaller NMSE values indicate betterperformance.In an attempt to further improve our results, we combined the blurredimages Ax in a couple di erent ways. First was a simple averaging combina4

Table 1: NMSE results for various smoothing kernels. (Smaller is better.)Kernel 35x57x7Standard 15tion, where we took the mean of {A1 x1 , A2 x2 . . . } and found the di erencebetween that mean and the original image. The other type of combinationperformed was done by finding the element-wise maximum among the elements of {A1 x1 , A2 x2 . . . } and used that as the final value for that entry ofAx. The averaging combination proved to be more e ective with a largerpenalty when the results were already quite good, but the combination madelittle di erence. The maximum combination was more e ective with sparserimages corresponding to lower penalty, e ecting a significant improvementin error in those cases. We believe that in the sparse case, this can bethought of as kernel selection, where the best shaped kernel for a given pixelregion is selected in the binary representation and then is blurred out by theappropriate operation.Combining size 3x3, 5x5, 7x7 Gaussians with standard deviation .9 aswell as size 5x5, 7x7 discs, max-combination error was 0.1736, averagecombination error was 0.1454. Combining size 3x3, 5x5, and 7x7 Gaussianswith standard deviation .8 as well as size 3x3, 9x9 discs, max-combinationerror was 0.1958, average-combination error was 0.1444. In general, witha large penalty, the average-combination error was smaller than the maxcombination error. The best result above was from the average-combination5

with error 0.1444, which is visually rather close to the original image and isshown in Figure 2.Figure 2: Average-combination of kernels M SE 0.1444, 100We also used a small penalty to encourage sparseness, making the kxk1term more influential in the minimization problem. With penalty 0.7,the results had an average of 10% nonzero entries in the binary solutions.Increasing sparseness is possible with lower penalty, but at the expense of alarge increase in error.Combining size 3x3, 5x5, 7x7 Gaussians with standard deviation .9 aswell as size 5x5, 7x7 discs, max-combination error was 0.5131, averagecombination error was 0.5945. Combining size 3x3, 5x5, 7x7 Gaussianswith standard deviation .8 as well as size 3x3, 9x9 discs, max-combinationerror was 0.5298, average-combination error was 0.6024.In general, with a large penalty, the average-combination error was largerthan the max-combination error. The best result above was the maxcombination error of 0.5131 and is shown in Figure 3. However, the largesparseness results in poor image reconstruction.Overall, this approach is valid as a image compression and reconstruction technique, at least with the larger penalty, since the method convertsa grayscale (8bit) image into a binary (1bit) image, which is a reduction in6

Figure 3: Max-combination of kernels M SE 0.5131, 0.7.memory storage of 87.5%. The remaining memory required to reconstructthe image to the 86% accuracy found above is at most two or three parameters: the type, size, and possibly standard deviation of the kernel. Thus wehave accomplished a large reduction in memory required with only a smallloss in image accuracy.Having examined exact NMSE results for two specific penalty values,we will now investigate the relationship between the penalty of the leastsquares term in the objective function, sparsity of these binary images, andNMSE of the blurred images over a range of penalty values. Sparsity asdepicted below is the number of nonzero elements (ones) in the binary image divided by the total number of pixels (64000). As the actual error andsparsity varied little between di erent types of Gaussian kernels, a 3x3 Gaussian kernel with standard deviation 0.9 was used to produce these results.First we look at the relationship between penalty and sparsity, as shown inFigure 4.One can see our sparsity starts quite low and rapidly levels out with anincrease in the penalty. This is unsurprising, as with very low penalty ourminimization problem is simply minimizing the ones in the image with noconstraint, while with higher penalty the least squares term dominates and7

Relationship between penalty and 0506070PenaltyFigure 4: Sparsity v.s. penalty.we e ectively ignore the sparsity component of the objective function. Next,we compare the penalty and the NMSE of the objective function, shown inFigure 5.Again, the results are not surprising. Since the Frobenius norm of matrices is e ectively the least squares term in our objective function, as weincrease the penalty to weight the objective function more heavily towardsthe least squares term we will have a corresponding decrease in the Frobenius norm of the NMSE. For both of the above plots, applying a log functionto the penalty term reveals they are roughly inverse relationships, but notexactly, as shown in Figure 6.Finally, we look at the relationship between sparsity and error, shownin Figure 7. Unsurprisingly, as sparsity decreases (the percent sparsity goesup) we see a decrease in NMSE:Ultimately, the increase in sparsity (lower value of sparsity) does notjustify the corresponding increase in error. While 5% sparsity would take asixth of the memory of 30% sparsity, the tradeo between about 20% errorfor the less sparse image and about 90% error for the sparser image is notnearly worth it. By virtue of compressing our image to a binary image,we have already accomplished significant memory reduction with minimalincrease in error, thus a further - smaller - reduction in memory usage doesnot justify a massive decrease in quality of the image.8

Relationship between penalty and NMSE of blurred 607045PenaltyFigure 5: NMSE v.s. penalty.Relationship between log(penalty) and (Penalty)Figure 6: NMSE v.s. log penalty.9

Relationship between sparsity and NMSE of blurred 50.20.250.30.35SparsityFigure 7: NMSE v.s. log sparsity.6Regression techniquesIn this section, we discuss our explorations in other image reconstructionmethods that we used to compare and contrast with the SA implementationsthat were developed. In particular, we considered both regular least squaresand ridge regression (Tikhonov regularization) to reconstruct the image.6.1Least squares and ridge regressionLet A 2 Rm n be the measurement matrix, such that m n and rank(A) n. Unless the measurements are perfect, the image vector b is outside thatcolumn space of A. Therefore it is hard to find an element x 2 Rn whichgives an exact solution to the overdetermined system(3)Ax b,even when the target b is in the range of A.One can still obtain an approximate solution to (3) by solving the following minimization problem:(4)x̂LS arg minn kAxx2R10bk22 .

Figure 8: Projection p Ax̂ is closest to b; so x̂ minimizes E kbAxk22 .This least squares solution is given by x̂LS : (AT A) 1 AT b. Note thatfinding x̂LS involves inversion of the matrix AT A. If m n, and matrixA is ill-conditioned, then AT A is singular or “nearly singular”. Moreover,if x̂LS has all m components non-zero, then it is not suitable as a sparsevector for explaining the data. To give preference to a particular solutionwith desirable properties one can solve the regularized problem(5)min kAxx2Rnbk22 kxk22 ,where 0 is a fixed balancing parameter. In figure (9), the solid bluearea represents the constraint region of kxk22 , while the red ellipses are thecontours of the least square error function. The ridge regression solutionto (5) is given by x̂Ridge (AT A In ) 1 (AT b). The minimum eigenvalueof AT A In is greater or equal to , which guarantees the invertibilityof (AT A In ). If the measurement matrix A is augmented with n addiptional row In , and the vector b with n zeros, then (5) can also be viewed Abas a least squares solution to the augmented dataset px . In0nTherefore (5) is equivalent to solve bA(6)x̂Ridge arg minn k pxk2 . In0n 2x2R11

Figure 9: Ridge regression estimate in R2 .6.2ImplementationRecall that the image obtained from the SA has binary entries. Let x 2{0, 1}n be the binary image obtained from SA. Denote T : {i : x i 1} asthe support set of x . We form a truncated matrix AT from A such thatAT Am T , where T is the cardinality of the set T . We use AT to solvea “truncated least squares” system:(7)x̂T LS arg min kAT xx2R T bk22 .Finally, we replace xT by x̂T LS . Next, we use “truncated ridge regression”to solve the minimization problem:(8)x̂T R arg min kAT xx2R T bk22 kxk22 .As before, we replace xT by x̂T R .Figures 10 and 11 show a comparison of the results from our SA implementation, least squares, and ridge regression. Observe the SA implementation is quite successful in comparison to these other two methods. Thetruncated ridge regression is better than truncated least squares.12

Original ImageBlurred SQA Reconstruction(a)(b)Figure 10: (a) Original Image, (b) SA reconstruction(a)(b)Figure 11: Comparison between (a) truncated least squares and (b) truncated ridge regression13

7SPGL1: A Solver for Large-scale Sparse OptimizationIn this section we discuss our use of a standard large scale sparse solverSPGL1 (see reference [3]) and compare our SA results in reconstructing theimage.7.1Outline of the method and resultsSolving the system Ax b where A 2 Rm n such that m n, su ersfrom ill-posedness. Classic sparse convex optimization problems which tryto solve the system are1. minx kxk1 subject to Ax b. (BP)2. minx kxk1 subject to kAx3. minx kAxbk2 . (BP )bk2 subject to kxk1 . (LS )Homotopy approaches, Basis Pursuit Denoting (BPDN) as a cone program,BPDN as a linear program, and Projected gradient method are the classicapproaches to solve the above problems. If b 2 R(A) and b 6 0, denotex as the optimal solution of (LS ). In SPGL one can consider the singleparameter function( ) kr k2 , with r : bAx ;which gives an optimal value (LS ) for each 0. So the method lies infinding a root of( ) .14

In order to derive the dual of (LS ), this method solves an equivalent problemmin krk2 subject to Ax r b, kxk1 .r,xTherefore dual of the above problem ismax{min{krk2y,forr,xy T (Ax rb) (kxk1 )}}, 0. Finally the dual of (LS ) reduces tomax bT yy, subject to kyk2 1, kAT yk1 .Theorem: With this setup, the following holds:1. The functionis convex and non-increasing.2. For all 2 (0, BP ), is continuously di erentiable, 0 ( ) , andTthe optimal dual variable kA y k1 , where y r /kr k2 .3. For 2 [0, BP ], kx k1 , andis strictly decreasing.The algorithm: Based on Newton iteration find k 1 k kwhere( k ).k) k 0 ( In Figure 12, we contrast the results from SA minimization and the SPGL1results. It is interesting to note that SPGL1 gives immediately a greyscaleimage, as it optimizes over a continuous range of x-values, while the SA resultshown here only uses binary values. The reconstructed image in Figure 1 isa better indication of the good results obtainable with SA.8Other attemptsWhile working on the project, we had an idea that we might be able to useSA to remove systematic blur from an image, such as the simulated motionblur shown in Figure 13. It was an interesting idea, but it is not clear thatwe obtained useful results. So we simply mention it here as an idea possiblyworth pursuing.15

SQA Reconstruction : Ridge Regression(a)(b)Figure 12: Comparison between (a) SA and (b) SPGL minimization.16

Blurred ImageFigure 13: Blurred image9Tuning and Optimizing the AlgorithmA significant concern with a real implementation of the quantum optimizeris that the computing hardware has a limited number of nodes to represent data in the Ising model. For instance, current hardware from D-Wavelimits this to about 1000 nodes. The image compression algorithm requireshundreds of thousands of nodes, which is problematic for existing hardware,thus we had to consider methods to break the large problem into smaller,computable problems.In this section we discuss an efficient way to tune and optimize thealgorithm. Recall that the original linear system is Ax b, where A 2Rm n , and m n. In our examples, m is very large and solving the systemrequires a huge amount of memory or compute nodes. However, things arebetter if one can find a B 2 Rk m , with k m, such that for a predefined 0, kx̂ xk2 , where Âx̂ b̂ and  BA, b̂ Bb.9.1Optimizing the Algorithm: Reducing RowsConstruct a vector b̃ from b such that b̃ (b(i) )m 2 Rm where b(1) · · · b(m) 0. For a tolerance 0 1, choose b̃(1 : k) if(9)kb̃(1 : k)k2 .kbk217 b(2)

kb̃(1:k)k2 }kbk2R S n . We form BLet S {i :be the support set and we construct Â, suchTthat  2 ei1 ei2 · · · eik , where eij is a 1 m vectorwith 1 in the ij th position and 0 elsewhere. To summarize, B acts as anindicator matrix which constructs  based on (9). We use SA on Âx̂ b̂,to reconstruct the image as shown in Figure 14.No. of rows: 64000- 10843Figure 14: Reduced row SA reconstructionIndeed this is a memory efficient reconstruction. Originally A had 64000rows. Using the indicator matrix B with 0.7 reduced the number of rowsin the image to 10843. On the other hand we also sacrificed the quality ofthe reconstructed image. For better reconstruction we target smaller blocksof the image instead of the entire matrix. We divideP b in to sub-vectors biusing (9) such that 1 i k m. We solve pj 1 Aij xi bi . Finallywe constructxRecovered (x̃j : 1 j p), where x̃j is a solution to thePsystem pj 1 Aij xi bi for each i. Now we use the idea of row reductionon each block Aij . For each block, using the previous technique we solvePpPpPpj 1 Âij xi b̂i , wherej 1 Âij j 1 Bi Aij Bi Ãi and Bi bi b̂i for1 j k and we obtain the recovered image as xRecovered (x̂j : 1 j p)Ppwhere x̃j is a solution to the systemÂi xi b̂i for each i. For aPk j 1 jpredefined 0, we can guarantee j 1 kxj x̂j k2 . The result of rowreduced block reconstruction is shown in Figure 15.We partitioned the image array into 40 sub matrices, compressed eachblock, and reconstructed. One can notice the partition lines in the SA18

recovery--by reducing the NO. of rows for each blockFigure 15: Row reduced block SA reconstructionreconstructed image in Figure 15. To avoid that, we use overlapping blockpartitions of the image and use row reduced SA reconstruction technique.At the end we merged the reconstructed overlapping blocks and obtain amuch better image as shown in Figure 16.Figure 16: Overlapping blocks row reduced block SA reconstruction19

10ConclusionTo summarize, our compression algorithm managed to accomplish a largereduction in memory while maintaining a minimal loss in image accuracywhen our penalty was large enough. We managed to get the best reconstruction by using an average of kernels. We found that by encouraging sparsity(decreasing the penalty) we lose accuracy. After trying to reconstruct theimage with one of the SPGL1 regression technique we found no improvementover a reconstruction using a kernel.Now that we have develop a working algorithm for image compression,a natural next step is to attempt to test this algorithm using a quantumannealer. Here is where our optimization methods might come in handywhen trying to implement this in a D-Wave system.References[1] Zhengbing Bain, Fabian Chudak, William G. Macready, and GeordieRose, “The Ising model: teaching an old problem new tricks,”D-Wave Systems Technical Report Aug. 30, 2010. Available from/www.dwavesys.com/resources/publications.[2] E. van den Berg and M. P. Friedlander, “Probing the Pareto frontierfor basis pursuit solutions”, SIAM J. on Scientific Computing, vol. 31,no. 2, pp. 890–912, Nov. 2008.[3] E. van den Berg and M. P. Friedlander, “A solver for large-scale sparsereconstruction,” http://www.cs.ubc.ca/labs/scl/spgl1, June 2007.[4] V. Cerny, “Thermodynamical approach to the traveling salesman problem: An efficient simulation algorithm,” J. Optimization Theory andApplications, vol. 45, no. 1, pp. 41–51, Jan. 1985.[5] S. V. Isakov, I. N. Zintchenko, T. F. Rønnow, and M. Troyer, “Optimized simulated annealing for Ising spin glasses,” Computer PhysicsCommunications, vol. 192, pp. 265–271, Jul. 2015.20

An advantage to the D-Wave system is that it can produce a spectrum of . Gaussian 5x5 1 0.1591 Gaussian 7x7 1 0.1576 Gaussian 3x3 0.9 0.1720 Gaussian 5x5 0.9 0.1573 . tion technique, at least with the larger penalty, since the method converts a grayscale (8bit) ima

Related Documents:

Image Compression Model Image compression reduces the amount of data from the original image representation. There are two approaches to compress an image. These are: (a) Lossless compression (b) Lossy compression Fig.2.2 shows a general image compression model. Image data representation has redundancy (also called pixel

Simulated Annealing Adapted from annealing thermal systems to achieve minimal energy states. To minimize the objective function !,Use the metropolis algorithm to sample from the Boltzmann distribution with !as our energy function: . “The Knapsack Problem” .

The odd-even image tree and DCT tree are also ideal for parallel computing. We use Matlab function Our Image Compression and Denoising Algorithm Input: Image Output: Compressed and denoised image 4 Decompressed and denoised image 4 Part One: Encoding 1.1 Transform the image 7 into an odd-even image tree where

chosen carefully. Using a method known as thermodynamic simulated annealing did not result in improvements for the test cases. In all of the variations, the selection of the initial temperature was found to have a signi cant impact. The simulated annealing algorithm is a valid option for solving the shelf space allocation problem.

Denoising and Compression Using Wavelets Juan Pablo Madrigal Cianci Trevor Gianinni December 15, 2016 Abstract An explanation of the theory behind signal and image denoising and compression is presented. Di erent examples of image and signal denois-ing and image compression are implemented using MATLAB. Some of their characteristics are discussed.

L2: x 0, image of L3: y 2, image of L4: y 3, image of L5: y x, image of L6: y x 1 b. image of L1: x 0, image of L2: x 0, image of L3: (0, 2), image of L4: (0, 3), image of L5: x 0, image of L6: x 0 c. image of L1– 6: y x 4. a. Q1 3, 1R b. ( 10, 0) c. (8, 6) 5. a x y b] a 21 50 ba x b a 2 1 b 4 2 O 46 2 4 2 2 4 y x A 1X2 A 1X1 A 1X 3 X1 X2 X3

tree structure. Egeblad and Pisinger (2009) propose a simulated annealing based methodology for the two and three-dimensional knapsack problems, and a three-dimensional knapsack model is presented. The authors present an iterative heuristic approach for the knapsack problem that is based on the sequence triple representation.

problem for large projects with many constraints, heuristic techniques are applied. A genetic algorithm approach is proposed and compared with simulated annealing. Key-Words: - multi-knapsack problem, resource-constrained scheduling, genetic algorithm, simulated annealing 1 Introduction The scheduling problem is a frequent task in the