The Mathematical Foundations Of Policy Gradient Methods

1y ago
2 Views
1 Downloads
4.44 MB
51 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Hayden Brunner
Transcription

The Mathematical Foundations ofPolicy Gradient MethodsSham M. KakadeUniversity of Washington&Microsoft Research

Reinforcement (interactive) learning (RL):

Markov Decision Processes:a frameworkfor sRL dS states.start with00A actions. A policy:๐œ‹: States Actionsdynamics model P(s 0 s, a). We execute๐œ‹ to robtainrewardfunction(s) a trajectory:๐‘ !, ๐‘Ž!, ๐‘Ÿ!, ๐‘ ", ๐‘Ž", ๐‘Ÿ", discount factor Total ๐›พ-discounted reward:Stochastic policy : st !# atSutton, Barto โ€™18&๐‘‰ (๐‘ !) ๐ธ 9 ๐›พ ๐‘Ÿ ๐‘ !, ๐œ‹Standard objective: find which %!maximizes:V (s0 ) E[r (s0 ) r (s1 ) 2r (s2 ) . . .]Goal: Find a policy that maximizes our value, ๐‘‰ ! (๐‘ " ).where the distribution of st , at is induced by .

Dexterous Robotic Hand ManipulationOpenAI, Oct 15, 2019Challenges in RL1. Exploration(the environment may beunknown)2. Credit assignment problem(due to delayed rewards)3. Large state/action spaces:hand state: joint angles/velocitiescube state: configurationactions: forces applied to actuators

Values, State-Action Values, and Advantages%๐‘‰ ! (๐‘ " ) ๐ธ ' ๐›พ # ๐‘Ÿ(๐‘ # , ๐‘Ž# ) ๐‘ " , ๐œ‹# "%๐‘„! (๐‘ " , ๐‘Ž" ) ๐ธ ' ๐›พ # ๐‘Ÿ(๐‘ # , ๐‘Ž# ) ๐‘ " , ๐‘Ž" , ๐œ‹# "๐ด! ๐‘ , ๐‘Ž ๐‘„! ๐‘ , ๐‘Ž ๐‘‰ ! (๐‘ ) Expectation with respect to sampled trajectories under ๐œ‹ Have S states and A actions. Effective โ€œhorizonโ€ is 1/(1 ๐›พ) time steps.

The โ€œTabularโ€ Dynamic Programming approachState ๐’”:(joint angles, cube config, )๐‘ธ๐… (๐’”, ๐’‚): state-action valueAction ๐’‚:(forces at joints)โ€œone step look-ahead valueโ€using ๐…(31 , 12 , , 8134, )(1.2 Newton, 0.1 Newton, )8 units of reward Table: โ€˜bookkeepingโ€™ for dynamic programming (with known rewards/dynamics)1. Estimate the state-action value ๐‘„! (๐‘ , ๐‘Ž) for every entry in the table.2. Update the policy ๐œ‹ & goto step 1 Generalization: how can we deal with this infinite table?using sampling/supervised learning?

This Tutorial:Mathematical Foundations of Policy Gradient Methodsยง Part โ€“ I: BasicsA. Derivation and EstimationB. Preconditioning and the Natural Policy Gradientยง Part โ€“ II: Convergence and ApproximationA. Convergence: This is a non-convex problems!B. Approximation: How to the think about the role of deep learning?

Part-1: Basics

State-Action Visitation Measures! This helps to clean up notation! โ€œOccupancy frequencyโ€ of being in state ๐‘  and action a, after following ๐œ‹ starting in ๐‘ !'๐‘‘ !! ๐‘  1 ๐›พ ๐ธ ๐›พ % ๐ผ ๐‘ % ๐‘  ๐‘ " , ๐œ‹%&" ๐‘‘.#! is a probability distribution With this notation:๐‘‰ # (๐‘ !)1 ๐ธ. 0"#! ,1 # ๐‘Ÿ(๐‘ , ๐‘Ž)1 ๐›พ

Direct Policy Optimization over Stochastic Policies ๐œ‹( ๐‘Ž ๐‘  is the probability of action ๐‘Ž given ๐‘ , parameterized by๐œ‹( ๐‘Ž ๐‘  exp(๐‘“( (๐‘ , ๐‘Ž)) Softmax policy class: ๐‘“( ๐‘ , ๐‘Ž ๐œƒ ,* Linear policy class: ๐‘“( ๐‘ , ๐‘Ž ๐œƒโƒ— ๐œ™(๐‘ , ๐‘Ž)where ๐œ™(๐‘ , ๐‘Ž) ๐‘… Neural policy class: ๐‘“( (๐‘ , ๐‘Ž) is a neural network

In practice, policy gradient methods rule They are the most effective method forobtaining state of the art.๐œƒ ๐œƒ ๐œ‚ ๐‘‰ !A (๐‘ " ) Why do we like them? They easily deal with large state/action spaces (through the neural net parameterization) We can estimate the gradient using only simulation of our current policy ๐œ‹!(the expectation is under the state actions visited under ๐œ‹! ) They directly optimize the cost function of interest!

Two (equal) expressions for the policy gradient!# ๐‘‰ ๐‘ "# ๐‘‰ ๐‘ "1#B ๐ธ & ,( ! ๐‘„ ๐‘ , ๐‘Ž log ๐œ‹# ๐‘Ž ๐‘ 1 ๐›พ1#B ๐ธ & ,( ! ๐ด ๐‘ , ๐‘Ž log ๐œ‹# ๐‘Ž ๐‘ 1 ๐›พ(some shorthand notation above) Where do these expression come from? How do we compute this?

Example: an important special case! Remember the softmax policy class (a โ€œtabularโ€ parameterization)๐œ‹C ๐‘Ž ๐‘  exp(๐œƒD,E ) Complete class with ๐‘†๐ด params:one parameter per state action, so it contains the optimal policy Expression for softmax class: ๐‘‰ C ๐‘ " ๐‘‘ !2 ๐‘  ๐œ‹C ๐‘Ž ๐‘  ๐ดC ๐‘ , ๐‘Ž ๐œƒD,E Intuition: increase ๐œƒ!,# if the โ€˜weightedโ€™ advantage is large.

Part-1A: Derivations and Estimation

General Derivation rV (s0 )Xr (a0 s0 )Q (s0 , a0 )a0 X a0 Xa0 a0 Xa0 (a0 s0 )rQ (s0 , a0 ) (a0 s0 ) r log (a0 s0 ) Q (s0 , a0 )a0 r (a0 s0 ) Q (s0 , a0 ) XX (a0 s0 )r r(s0 , a0 ) Xs1 P (s1 s0 , a0 )V (s1 ) (a0 s0 ) r log (a0 s0 ) Q (s0 , a0 ) Xa0 ,s1E [Q (s0 , a0 )r log (a0 s0 )] E [rV (s1 )] . (a0 s0 )P (s1 s0 , a0 )rV (s1 )

SL vs RL: How do we obtain gradients? In supervised learning, how do we compute the gradient of our loss ๐ฟ(๐œƒ)?๐œƒ ๐œƒ ๐œ‚ ๐ฟ(๐œƒ) Hint: can we compute our loss? In reinforcement learning, how do we compute the policy gradient ๐‘‰ 3 (๐‘ !)?๐œƒ ๐œƒ ๐œ‚ ๐‘‰ C (๐‘ " )# ๐‘‰ ๐‘ "1 ๐ธ ,( ๐‘„ # ๐‘ , ๐‘Ž log ๐œ‹# ๐‘Ž ๐‘ 1 ๐›พ

Monte Carlo Estimation Sample a trajectory: execute ๐œ‹3 and ๐‘ !, ๐‘Ž!, ๐‘Ÿ!, ๐‘ ", ๐‘Ž", ๐‘Ÿ", b t , at )Q(s[ rV 1Xt0 01Xt 0t0tr(st0 t , at0 t )b t , at )r log (at st )Q(s Lemma: [Glynn โ€™90, Williams โ€˜92]] This gives an unbiased estimate of the gradient:# ๐‘‰ (๐‘  )E ๐‘‰%This is the โ€œlikelihood ratioโ€ method.

Back to the softmax policy class ๐œ‹C ๐‘Ž ๐‘  exp(๐œƒD,E ) Expression for softmax class: ๐‘‰ C ๐‘ " ๐‘‘ !2 ๐‘  ๐œ‹C ๐‘Ž ๐‘  ๐ดC ๐‘ , ๐‘Ž ๐œƒD,E What might be making gradient estimation difficult here?(hint: when does gradient descent โ€œeffectiveโ€ stop?)

Part-1B: Preconditioning and theNatural Policy Gradient

A closer look at Natural Policy Gradient (NPG) Practice: (almost) all methods are gradient based, usually variants of:Natural Policy Gradient [K. โ€˜01]; TRPO [Schulman โ€˜15]; PPO [Schulman โ€˜17] NPG warps the distance metric to stretch the corners out (using the Fisherinformation metric) move โ€˜moreโ€™ near the boundaries. The update is:๐น ๐œƒ ๐ธ. 0# ,1 # log ๐œ‹3 ๐‘Ž ๐‘  log ๐œ‹3 ๐‘Ž ๐‘ ๐œƒ ๐œƒ ๐œ‚๐น ๐œƒ5" ๐‘‰ 3 (๐‘  )!4

TRPO (Trust Region Policy Optimization) TRPO [Schulman โ€˜15] (related: PPO [Schulman โ€˜17]):move staying โ€œcloseโ€ in KL to previous policy:๐œƒ 6" argmin3 ๐‘‰ 3 (๐‘ !)s. t. ๐ธ. 0# ๐พ๐ฟ ๐œ‹ 3 ๐‘  R ๐œ‹ 3 ๐‘  NPG TRPO: they are first order equivalent (and have same practical behavior)

NPG intuition. But first . NPG as preconditioning:๐œƒ ๐œƒ ๐œ‚๐น ๐œƒ5" ๐‘‰ 3 (๐‘  )!OR๐œ‚๐œƒ ๐œƒ ๐ธ log ๐œ‹3 ๐‘Ž ๐‘  log ๐œ‹3 ๐‘Ž ๐‘ 1 ๐›พ4 What does the following problem remind you of?๐ธ ๐‘‹๐‘‹ 7 What is NPG is trying to approximate?5" ๐ธ[๐‘‹๐‘Œ]5" ๐ธ log ๐œ‹3 ๐‘Ž ๐‘  ๐ด3 (๐‘ , ๐‘Ž)

Equivalent Update Rule (for the softmax) Take the best linear fit of ๐‘„ 3 in โ€œpolicy spaceโ€-featuresโ€: this gives ๐ด3 (๐‘ , ๐‘Ž)W.,1 Using the NPG update rule :๐œƒ.,1 ๐œƒ.,1 ๐œ‚๐ด3 (๐‘ , ๐‘Ž)1 ฮณ And so an equivalent update rule to NPG is:๐œ‹3 ๐‘Ž ๐‘  ๐œ‹3๐œ‚๐‘Ž ๐‘  exp๐ด3 (๐‘ , ๐‘Ž) /๐‘1 ฮณ What algorithm does this remind you of?Questions: convergence? General case/approximation?

But does gradient descent even work in RL?Reinforcement LearningSupervised LearningWhat about approximation?Stay tuned!!

Part-2: Convergence andApproximation

The Optimization LandscapeSupervised Learning: Gradient descent tends to โ€˜justworkโ€™ in practice and is notsensitive to initialization Saddle points not a problem Reinforcement Learning: Local search depends on initialization inmany real problems, due to โ€œveryโ€ flatregions. Gradients can be exponentially small inthe โ€œhorizonโ€

Prior work: The Explore/Exploit TradeoffRL and the vanishing gradient problems!Thrun โ€™92Reinforcement Learning:Randomdoesfindreward Thesearchrandom init.hasnotโ€œveryโ€flattheregionsin realquickly.problems (lack of โ€˜explorationโ€™) Lemma: [Agarwal, Lee, K., Mahajan random init,theall ๐‘˜-thhigher-order gradientsare 2# /& in magnitude for up to k H/ ln ๐ป orders, ๐ป 1/(1 ๐›พ).[Kearns& Singh, โ€™02] E 3 isa near-optimal algo. This is a landscape/optimization issues.Sampleโ€™03,โ€™17](also acomplexity:statistical issue[K.if weusedAzarrandominit).Model free: [Strehl et.al. โ€™06; Dann and Brunskill โ€™15; Szita &Szepesvari โ€™10; Lattimore et.al. โ€™14; Jin et.al. โ€™18]

Part 2:Understanding the convergence properties of the (NPG) policy gradientmethods!ยง A: Convergence Letโ€™s look at the tabular/โ€softmaxโ€ caseยง B: Approximationยง Approximation: โ€œlinearโ€ policies and neural nets

NPG: back to the โ€œsoftโ€ policy iteration interpretation Remember the softmax policy class๐œ‹3 ๐‘Ž ๐‘  exp(๐œƒ.,1 )has ๐‘† ๐ด params At iteration t, the NPG update rule:๐œƒ 6" ๐œƒ ๐œ‚ ๐น ๐œƒ 5" ๐‘‰ (๐‘  )!is equivalent to a โ€œsoftโ€ (exact) policy iteration update rule:๐œ‹ 6"๐‘Ž ๐‘  ๐œ‹ ๐œ‚๐‘Ž ๐‘  exp๐ด (๐‘ , ๐‘Ž) /๐‘1 ฮณ What happens for this non-convex update rule?

Part-2A: Global Convergence

Provable Global Convergence of NPGTheorem [Agarwal, Lee, K., Mahajan 2019]For the softmax policy class, with ๐œ‚ 1 ๐›พ & log ๐ด ,we have after T iterations,2' ๐‘‰๐‘ % ๐‘‰ ๐‘ % 1 ๐›พ &๐‘‡ Dimension free iteration complexity! (No dependence on ๐‘†, ๐ด)Also a โ€œFAST RATEโ€! Even though problem is non-convex, a mirror descent analysis applies.Analysis idea from [Even-Dar, K., Mansour 2009] What about approximate/sampled gradients and large state space?

Notes: Potentials and Progress?

But first, the โ€œPerformance Difference Lemmaโ€ Lemma: [Kโ€™02]: a characterization of the performance gap between any two policies:%๐‘‰!๐‘ " ๐‘‰!9# !9๐‘ " ๐ธE: ,D; ,E; ! ' ๐›พ ๐ด (๐‘ # , ๐‘Ž# ) ๐‘ "# " Q๐ธD T ,E !QRS!9๐ด๐‘ , ๐‘Ž

Mirror Descent Gives a Proof!(even though it is non-convex) ?(t)?(t 1)?Es d KL( s s ) KL( s s) Es d? EXs d? ?(t 1) (a s)? (a s) log (t) (a s)a X ?(t) (a s)A (s, a)1a V (s0 )V (t) (s0 )Es d? log Zt (s)X a (a s) log Zt (s)!

Notes: are we making progress?

Re-arranging?V (s0 ) 1V (t) (s0 ) Es d? KL( s? s(t) ) KL( s? s(t 1) ) log Zt (s)

Understanding progress:V ?(s0 )1 TTX1V (T(V ?1)(s0 )(s0 )V (t) (s0 ))t 01 Es d? (KL( s? s(0) ) Tlog A 1 T TTX1t 0TX11KL( s? s(T ) )) Es d? log Zt (s) T t 0Es d? log Zt (s)

A slow rate proof sketch

The key lemma for the fast rate Es ยต log Zt (s) . . . . 1 Es ยต V(t 1)(s)V(t)(s)

The fast rate proof!V ?(s0 )V (Tlog A 1 T T1)(s0 )TX1Es d? log Zt (s)t 0log A 1 T(1)TTX1 V (t 1) (d? )t 0log A V (T ) (d? ) V (0) (d? ) T(1)Tlog A 1 .2 T(1) TV (t) (d? )

Part-2B: Approximation(and statistics)

Remember our policy classes: ๐œ‹( ๐‘Ž ๐‘  is the probability of action ๐‘Ž given ๐‘ , parameterized by๐œ‹( ๐‘Ž ๐‘  exp(๐‘“( (๐‘ , ๐‘Ž)) Softmax policy class: ๐‘“( ๐‘ , ๐‘Ž ๐œƒ ,* Linear policy class: ๐‘“( ๐‘ , ๐‘Ž ๐œƒโƒ— ๐œ™(๐‘ , ๐‘Ž)where ๐œ™(๐‘ , ๐‘Ž) ๐‘… Neural policy class: ๐‘“( (๐‘ , ๐‘Ž) is a neural network

OpenAI: dexterous hand manipulation not far off?Trained with โ€œdomain randomizationโ€Basically: The measure ๐‘ ! ๐œ‡ wasdiverse.

Prior work: The Explore/Exploit TradeoffPolicy search algorithms: exploration and start state-measures๐‘š๐‘Ž๐‘ฅ ๐ธ 0 [๐‘‰ ( ๐‘  ]s"( .Thrun โ€™92Random search does not find the reward quickly. Idea:Reweightingby a diversedistribution(theory)Balancingthe explore/exploittradeoff:๐œ‡ to handles the โ€vanishing gradientโ€problem.[Kearns& Singh, โ€™02] E 3 is a near-optimal algo.Sampleโ€™03, Azar Therecomplexity:is sense in[K.whichthisโ€™17]reweighting is related toModel free: [Strehl et.al. โ€™06; Dann and Brunskill โ€™15; Szita &Szepesvari โ€™10; Lattimore et.al. โ€™14; Jin et.al. โ€™18]the a โ€œcondition numberโ€ Related theory: [K. & Langford; โ€˜02] [K. โ€˜03] Conservative policy iteration (CPI) has the strongest provableguarantees, in terms of the๐œ‡ along with the error of a โ€˜supervisedlearningโ€™ black box.S. M. Kakade (UW)Curiosity4 / 16 Other โ€˜reductions to SLโ€™ : [Bagnell et al, โ€˜04], [Scherer & Geist, โ€˜14], [Geist et al., โ€˜19], etc helpful for imitation learning: [Ross et al., 2011]; [Ross & Bagnell, 2014]; [Sun et al., 2017 ]

NPG for the linear policy class Now:๐œ‹3 ๐‘Ž ๐‘  exp(๐œƒ.,1 ๐œ™.,1 ) Take the best linear fit in โ€œpolicy spaceโ€-features:W argmin ๐ธ.! ๐ธ.,1 0"#!๐‘Š ๐œ™.,1 ๐ด3๐‘ , ๐‘Ž ๐œ‡ is our start-state distribution, hopefully with โ€œcoverageโ€b3 ๐‘ , ๐‘Ž ๐‘Š ๐œ™ , and the NPG is update is equivalent to: Define ๐ด.,1๐œ‹3 ๐‘Ž ๐‘  ๐œ‹3๐œ‚ b3๐‘Ž ๐‘  exp๐ด ๐‘ , ๐‘Ž1 ฮณ This is like a soft โ€œapproximateโ€ policy iteration step./๐‘

Sample Based NPG, linear case Sample trajectories: at iteration t, using start state s! ๐œ‡, then follow ๐œ‹ Now do regression on this sampled data: 3 b๐‘Š argmin ๐ธc.,1 ๐‘Š ๐œ™.,1 ๐ด ๐‘ , ๐‘Ž Define:b ๐‘ , ๐‘Ž ๐‘Šb ๐œ™.,1๐ด And so an equivalent update rule to NPG is:๐œ‹ 6"๐‘Ž ๐‘  ๐œ‹ ๐œ‚ b ๐‘Ž ๐‘  exp๐ด ๐‘ , ๐‘Ž1 ฮณ/๐‘

Guarantees: NPG for linear policy classes (realizability) Suppose that ๐ด' ๐‘ , ๐‘Ž is a linear function in ๐œ™(,* supervised learning error: suppose we have bounded regression error, say due to sampling๐ธY ๐ด ๐‘ , ๐‘Ž [ ๐‘Š ๐œ™ ๐‘ , ๐‘Ž& ๐œ€ relative condition number: (to opt state-action measure ๐‘‘ starting from ๐‘ - )๐œ… max@๐ธ.,1 0 ๐œ™.,1 ๐‘ฅ๐ธ.,1 A๐œ™.,1 ๐‘ฅ Theorem [Agarwal, Lee, K., Mahajan 2019]๐ด: # actions. ๐ป: horizon. After ๐‘‡ iterations, for all ๐‘ !, the NPG algorithm satisfies:๐ป 2 log ๐ด3 %๐‘‰ ๐‘ ! ๐‘‰ ๐‘ ! ๐ป?๐œ… ๐œ€๐‘‡

Sample Based NPG, neural case Now:๐œ‹3 ๐‘Ž ๐‘  exp(๐‘“3 (๐‘ , ๐‘Ž)) Sampling: at iteration t, sample s! ๐œ‡ and follow ๐œ‹, Supervised learning/regression:b argmin ๐ธc.,1๐‘Š Define:๐‘Š ๐‘“3 ๐‘ , ๐‘Ž ๐ด3 ๐‘ , ๐‘Žb ๐‘ , ๐‘Ž ๐‘Šb ๐‘“3 ๐‘ , ๐‘Ž๐ด The NPG is:๐œ‹ 6"๐‘Ž ๐‘  ๐œ‹ ๐œ‚ b ๐‘Ž ๐‘  exp๐ด ๐‘ , ๐‘Ž1 ฮณ/๐‘

Guarantees: NPG for linear policy classes (realizability) Suppose that ๐ด' ๐‘ , ๐‘Ž is a linear function in ๐‘“' ๐‘ , ๐‘Ž supervised learning error: suppose we have bounded regression error, say due to sampling&' [๐ธ(,* / ๐ด ๐‘ , ๐‘Ž ๐‘Š ๐‘“' ๐‘ , ๐‘Ž ๐œ€ relative condition number: (to opt state-action measure ๐‘‘ starting from ๐‘ - )๐œ… max@๐ธ.,1 0 ๐‘“! ๐‘ , ๐‘Ž ๐‘ฅ๐ธ.,1 A ๐‘“! ๐‘ , ๐‘Ž.,1 ๐‘ฅ Theorem [Agarwal, Lee, K., Mahajan 2019]๐ด: # actions. ๐ป: horizon. After ๐‘‡ iterations, for all ๐‘ !, the NPG algorithm satisfies:๐ป 2 log ๐ด3 %๐‘‰ ๐‘ ! ๐‘‰ ๐‘ ! ๐ป?๐œ… ๐œ€๐‘‡NTK TRPO analysis [Lie et. al โ€˜19]

Thank you! Today: mathematical foundations of policy gradient methods. With โ€œcoverageโ€, policy gradients have the strongest theoretical guaranteesand are practically effective! New directions/not discussed: design of good exploratory distributions ๐œ‡ Relations to transfer learning and โ€œdistribution shiftโ€RL is a very relevant area, both now and the in the future!With some basics, please participate

Some details for the fast rate!V(t 1) 1(t)(ยต) V (ยต)X1Es d(t 1) (t 1) (a s)A(t) (s, a)ยตa(t 1)X1 (a s)Zt (s)(t 1) Es d(t 1) (a s) log(t) (a s)ยต a11(t 1)(t) Es d(t 1) KL( s s ) Es d(t 1) log Zt (s)ยตยต 11Es d(t 1) log Zt (s)Es ยต log Zt (s).ยต

(also a statistical issue if we used random init). Prior work: The Explore/Exploit Tradeoff Thrun '92 Random search does not ๏ฌnd the reward quickly. (theory) Balancing the explore/exploit tradeoff: [Kearns & Singh, '02] E 3 is a near-optimal algo. Sample complexity: [K. '03, Azar '17]

Related Documents:

May 02, 2018ย ยท D. Program Evaluation อŸThe organization has provided a description of the framework for how each program will be evaluated. The framework should include all the elements below: อŸThe evaluation methods are cost-effective for the organization อŸQuantitative and qualitative data is being collected (at Basics tier, data collection must have begun)

Silat is a combative art of self-defense and survival rooted from Matay archipelago. It was traced at thรฉ early of Langkasuka Kingdom (2nd century CE) till thรฉ reign of Melaka (Malaysia) Sultanate era (13th century). Silat has now evolved to become part of social culture and tradition with thรฉ appearance of a fine physical and spiritual .

On an exceptional basis, Member States may request UNESCO to provide thรฉ candidates with access to thรฉ platform so they can complรจte thรฉ form by themselves. Thรจse requests must be addressed to esd rize unesco. or by 15 A ril 2021 UNESCO will provide thรฉ nomineewith accessto thรฉ platform via their รฉmail address.

ฬถThe leading indicator of employee engagement is based on the quality of the relationship between employee and supervisor Empower your managers! ฬถHelp them understand the impact on the organization ฬถShare important changes, plan options, tasks, and deadlines ฬถProvide key messages and talking points ฬถPrepare them to answer employee questions

Dr. Sunita Bharatwal** Dr. Pawan Garga*** Abstract Customer satisfaction is derived from thรจ functionalities and values, a product or Service can provide. The current study aims to segregate thรจ dimensions of ordine Service quality and gather insights on its impact on web shopping. The trends of purchases have

Chรญnh Vฤƒn.- Cรฒn ฤ‘แปฉc Thแบฟ tรดn thรฌ tuแป‡ giรกc cแปฑc kแปณ trong sแบกch 8: hiแป‡n hร nh bแบฅt nhแป‹ 9, ฤ‘แบกt ฤ‘แบฟn vรด tฦฐแป›ng 10, ฤ‘แปฉng vร o chแป— ฤ‘แปฉng cแปงa cรกc ฤ‘แปฉc Thแบฟ tรดn 11, thแปƒ hiแป‡n tรญnh bรฌnh ฤ‘แบณng cแปงa cรกc Ngร i, ฤ‘แบฟn chแป— khรดng cรฒn chฦฐแป›ng ngแบกi 12, giรกo phรกp khรดng thแปƒ khuynh ฤ‘แบฃo, tรขm thแปฉc khรดng bแป‹ cแบฃn trแปŸ, cรกi ฤ‘ฦฐแปฃc

Le genou de Lucy. Odile Jacob. 1999. Coppens Y. Prรฉ-textes. Lโ€™homme prรฉhistorique en morceaux. Eds Odile Jacob. 2011. Costentin J., Delaveau P. Cafรฉ, thรฉ, chocolat, les bons effets sur le cerveau et pour le corps. Editions Odile Jacob. 2010. Crawford M., Marsh D. The driving force : food in human evolution and the future.

Le genou de Lucy. Odile Jacob. 1999. Coppens Y. Prรฉ-textes. Lโ€™homme prรฉhistorique en morceaux. Eds Odile Jacob. 2011. Costentin J., Delaveau P. Cafรฉ, thรฉ, chocolat, les bons effets sur le cerveau et pour le corps. Editions Odile Jacob. 2010. 3 Crawford M., Marsh D. The driving force : food in human evolution and the future.