Errata
Below is a list of errata that can confuse understanding (simple typos are not listed). Most of these errata were corrected either by the 3rd or 4th printing of the book.

Chapter 1: Introduction

Chapter 2: Foundations
 p.18, eq (2.3), and p. 19 line 5: should be P instead of p
 p. 30 Near the bottom – after the ≈ in the numerator dydx’ should be just dy
 p. 34 boundary should be defined here; it is defined on p. 149
 p. 37 End of 2nd paragraph – “C, D, F, G, C” should be C, D, G, F, C.
 p. 41, ex. 2.17: K=Val(X)
 p. 41, ex. 2.19a: I_P(X;Y  Z) = H_P(X  Z) – H_P(X  Y,Z)

Chapter 3: The Bayesian Network Representation
 p. 50 X1,…Xk – the “k” should be n (twice on the page)
 p. 53, Fig. 3.4 The table for grade – left hand column – 3rd and 4th rows should start with i_1, not i_0; the last entry for the Letter table should be g_3
 Last paragraph, “in a hard class may only be 30 percent” – should be 50 percent.
 p. 58 Figure 3.B.1 (b) – the two bubbles labeled G_Harry and B_Harry should be Clancy, not Harry.
 p. 61 Last paragraph next to last line – “equation (3.14)” should be 3.13.
 p. 75 Line 1 of algorithm should end in “Z into A”
 p. 85 1st line after the algorithm – should be “Witnesses(Xi, Xj, H, d)”.
 p. 85 5th line after the algorithm – the superscript on the 2nd Nb should be H.
 p. 85 5th line after the algorithm, equation should read U \subseteq Nb^H_{X_i} – \{ X_j \}
 p. 85 7th linef after the algorithm, equation should read U \subseteq Nb^H_{X_j} – \{ X_i\}
 p. 93, line 1: multiple Imaps > multiple minimal Imaps

Chapter 4: Undirected Graphical Models
 page 105, 3rd line from end: P(b1) ~ .732 and P(b0) = .268.
 p. 111, Dfn 4.6: tilde{P}[u] should be P_{Phi}[u]
 p. 120 First line – “let X = “, should be let X’ =
 p. 124, Dfn. 4.14: a feature is a function from Val(D) to R.
 p. 125, Example 4.10: epsilon(A_1,A_2)=3 (not 3) when A_1=A_2
 p. 127 Box 4.D 3rd line, should be “to each X_i a label in the space”.
 p. 128, Fig. 4.10: all numbers except for the +1 and the 0 should be negative
 p. 129, paragraph 4: the empty clique (corresponding to a constant) is also part of the canonical parameterization
 p. 130, Fig. 4.11: the canonical parameterization shown is wrong. The correct parameterization has e_8(d1) = 0; e_3(c1,d1)=9.21 and 0 otherwise; e_4(a1,d1)=9.21 and 0 otherwise. The other entries are correct.
 p. 130, Theorem 4.7: The definition of P(\xi) has an unnecessary negation sign
 p. 133 Next to last line – instead of “adding any of the original features” should be “any of the other original features”
 p. 134 Section 4.5.1 next to last line should start with I(G).
 p. 135 3rd line – figure 4.6a shows the moralized graph of the extended student network of figure 9.8.
 p. 143 5th line from bottom should start with definition 4.3, not 4.4.
 p. 150 last line starts with “between A and E”, should be C instead of A.
 p. 151 Def 4.24 defines the global not local independencies

Chapter 5: Local Probabilistic Models

p. 165, after Dfn. 5.3: selector variables > multiplexer variable
 p. 171 BOX 5.B 5 lines from the bottom – “and the latter might be…”, should be former not latter.
 p. 171 Example 5.14 last line, should be a^1, not a^0.
 p. 173 last line before alg 5.2 – last words “tree Tc” should be rule set R[c].
 p. 173 line 5 of alg 5.2 – dsep(X;YZ,C) should be dsep(X;YZ)
 p. 174 first line CSIsep(G,P,c,…) the P should not be in the list.
 p. 180, line 8: X_i=true should be X_j=true
 p. 182, line 5: naive Bayes should be naive Markov
 p. 190 Def 5.15 last full line of text – the final subscript on a should be k, not k+1.

Chapter 6: TemplateBased Representations
 p. 201 The two equations for P(X(0:T)) and for P(X(0),x(1)…) should each have a multiplier on the RHS of P(X(0)).
 p. 207 4th line should start with s_k rather than s.
 p. 211, ex. 6.7: should be P(X’  X,V)=N(X + V\Delta; \sigma^2_X) and P(V’  V) = N(V; \sigma^2_V)
 p. 219, Fig. 6.7b: there should be no arrows from D(C_i) to I(s_j,C_i)
 p. 221, 222 – final argument to B() should be U_{i_l}
 p. 229 2 lines above ex. 6.19 – O^k[A] should be O^k[α(a)].
 p. 229 Last line of def 6.15– γ is confusingly used in two different ways.
 p. 244, ex. 6.3a – state t > state s_i
 p. 245, ex. 6.6, line 1: K should be L. Each sensor reading is obtained from a single object.

Chapter 7: Gaussian Network Models
 p. 248, eq above (7.2): 2x^T u should be 2x^T \mu
 p. 250 2 lines above lemma 7.1 – Σ_{XY} should be Σ_{YX} and its size is m x n.
 p. 252, Theorem 7.4, in definition of beta, Sigma_{Y X} should be Sigma_{X Y}
 p. 252: to match example 7.1, p(X_2  X_1) = N(0.5X_1 – 3.5; 4}, and p(X_3  X_2) = N(X_2 + 1; 3)
 p 256 – the first equation is inconsistent with Dfn. 7.2; need to consider each i separately rather than the summation
 p. 255256 – The fact that a GMRF is attractive does not, by itself, imply that it’s positive definite
 p. 259, ex. 7.7: scope of sqrt in definition of rho_{i,j} should be both variances; also the conditional covariance is Cov_p[X_i;X_j  Z]

Chapter 8: The Exponential Family
 p. 264 – η is a vector, η1 and η2 are scalars.
 p. 269 – next to last line, “we discussed in chapter 2” should be “we discuss in appendix A.1.1”.
 p. 277 Prop 8.3 – arg max should be arg min.
 p. 279 Figure 8.3 – the label on the arc from P should be E_P[tau(X)]; also, both expectations should have an argument of tau(X), not sigma(X).
 p. 279 Ex. 8.15 – the rhs of the first equation should be E_{Q_{(mu,sigma^2)}}
 p. 283, Ex. 8.1 – formula should be exp(theta) theta^k

Chapter 9: Variable Elimination
 p. 288 4th line – “partial MAP” should be “marginal MAP”
 p. 295 last paragraph – first line – “requires two multiplications” should be four multiplications. Also, the number of total multiplications is therefore 18 (not 12).
 p. 301 step 5 – last factor – phi_G should be phi_L
 p. 302, table 9.2: line 6 should not include tau_5(D,J); that factor should be included in the elimination in line 7
 p. 305 third bullet should end with “for all y_i’ ≠ y_i”
 p. 306 9.4.1 2nd paragraph, 3rd line – factor Ψ should be factor Ψ_i. The next line should start with “is the scope
of Ψ_i minus X_i”. Also in the last paragraph before 9.4.2 – 2nd line, factor Ψ should be factor Ψ_i.  p. 307 def 9.4 after the equation, “appearing in one” should be “appearing in any”
 p. 308 Figure 9.10 – the caption lists (b), (c), and (d), should be (a), (b), and (c)
 p. 308 middle of page – the sentence “For clarity, the figure still contains variables and edges that are removed, but marked with dashed lines” is not correct.
 p. 310 the list of cliques in the middle of the page should also include C5 = {G, S, I}
 p. 311 5th line from bottom – C_i should be C_k
 p. 312 example 9.3 – 5th line from bottom should end with S, J, G, L, I, D, H, C. Also, we first eliminate C then H,D (not H and then C,D).
 p. 318 ex 9.4 – 4th line from the bottom ends with “factor influences”, should be “component influences”
 p. 318 section 9.5.2 – 6th line the cost of the computation is proportional to Val(U)
 p. 319 6th line from bottom – phi_X(Y,G) should be phi_X^+(Y,G)
 p. 320 Table 9.5 – the factors phi_S, phi_H, and phi_J should not have a superscript
 p. 322 def 9.7 – vars should be caligraphic X
 p. 332 alg 9.6 line 2 should read if Scope[c’] ⊆ Scope[c]; line 3 should read Select Y ∈ Scope[c’] – Scope[c]
 p. 333 alg 9.7 2nd parameter should be Y, not Z; line 8 should end in p_k}; lines 2026 should be unindented one level
 p. 334 ex 9.15 – 3rd line should say “no longer depends on C”.
 p. 334, fig. 9.16: (a) is conditioning on a^0; (b) is conditioning on a^1.

Chapter 10: Clique Trees
 p. 346 def 10.1 – 3rd line “cluster C” should be “cluster C_i”; factor tau_3 should be message tau_3
 p. 350 Figure 10.3 – the box in (b) between cluster 3 and 5, 2nd line, should start with ∑J,L
 p. 351 2 lines above the equation in the middle of the page – “and we eliminate G” should be “and we eliminate J”
 p. 353 alg 10.1 – procedure initializeCliques, line 2 should end with ϕj
 p. 355 next to last paragraph, 2nd to last line – 2nd sentence should be “in the second of three executions”
 p. 355 4th line from bottom should be “whenever a message is sent between two cliques in the same direction“
 p. 356 Figure 10.5 (a,b) box between cliques 3 and 5, 2nd line should start with ∑J,L
 p. 359, Alg 10.A.1: need to add line 9 setting Assignment[l] to 0
 p. 362, Fig. 10.6: table headings should marginalize (sum) rather than max
 p. 368 Thm 10.5 – end of third line should be Ci, Cj
 p. 371 alg 10.4 – line 9 ends in SumProductVariable Elimination(…), should be SumProductVE(…)
 p. 371, Alg. 10.4: line 4 should be V_{T’}{r}

Chapter 11: Inference as Optimization
 p. 383, next to last paragraph: T is a clique tree, not just a cluster tree
 p. 383: (see Chapter 2) > (see Appendix A.1.3.3)
 p. 390: Nb_i 1 to (Nb_i – 1 )
 p. 390: 11.2.2 first para, last line, switch lefthand and righthand.
 p. 393: Caption of fig 11.3, {B,D} > {B,E}
 p. 407, last eq: missing \psi_i before \Pi
 p. 408, first eq: missing \psi_i before \Pi
 p. 417: “RegionGraphOptimize” to “CGraphOptimize”
 p. 424, RHS of Eq 11.33: psi_r(c_r) should be log psi_r(c_r); missing term – kappa_r
 p. 427, 3rd equation from bottom “\delta_{1>4} =”: remove \delta_{2>4}
 p. 427, last sentence: add “delta{2>4} and” before “\delta_{6>7}”
 p. 433, 3rd sentence from end: change “A_{2,1}” to “A_{1,2}”
 p. 436, last equation:, changed \psi_3 to \psi_2
 p. 440, 3rd equation: remove summation
 p. 444, 6th line from last: k > k1
 p. 449: proposition 11.2 > theorem 11.2
 p. 451, equation right after (11.52): ln(u_\phi) > ln (\phi(u_\phi))
 p. 455, Alg 11.7, line 7: add U_\phi to argument of ln
 p. 457: (last few lines) fixed confusion about rows and columns.
 p. 460, paragraphs 2 and 3: in the parallel update strategy, each of the queries on the clique tree has different evidence, and so it’s not that easy for the parallel update strategy to achieve significant savings over the sequential update strategy
 p. 461, Eq (11.62): remove ln on lhs
 p. 463, Last equation, last line: B > D in first term
 p. 466 last line: A_1,3 > A_1,2, and last one to A_1,4
 p. 473 2nd eq: \tilde{\phi}_4(b,\lambda’) > \tilde{\phi}_4(c,\lambda’)
 p. 479, ex 11.14, gradient of J: ln psi_i(c_r) should be kappa_r ln psi_r(c_r); 1 on RHS should be kappa_r

Chapter 12: ParticleBased Approximate Inference
 p. 488 Same as corrections for p. 53
 p. 492 ex 12.2 next to last line you have “i50 percent” should be”30 percent”
 p. 495 Next to last paragraph, “discussed in chapter 2” should be “discuss in Appendix A.2”
 p. 499 The bubble for Intelligence should have a double circle
 p. 499 last line above prop 12.2 should have mutilated network B_{Z=z}, not B_{E=e}
 p. 500 first paragraph of 12.2.3.2 last sentence should be query z, not query y.
 p . 502 6 lines from bottom of page – should have M > γu^k / P_B(z)
 p. 508 ex 12.6 3rd line – X^{(t)} should be X^{(0)}
 p. 513 First set of equations – need to change x to x” (four times) and C_j to D_j (everywhere); also, in the last equation i needs to be i”
 p. 516, paragraph 2: reversibility does imply that pi is a stationary distribution of the chain, but without regularity doesn’t guarantee convergence to that stationary distribution
 p. 518 2nd paragraph starting with As for Gibbs sampling – 3rd line – Y_i should be U_i
 p. 521, Thm 12.6: missing sqrt(M) on the LHS of the first equation; also in. Eq. 12.27, f(X) should be f(X[m])
 p. 523 first equation 1/T should be 1/M; fourth equation 1/T1 should be 1/M1
 p. 524 first equation, there should be no negative sign inside the exp
 p. 530 4th line from bottom of page – B_{X_p=x_p} should be B_{X_p=x_p, E_p=e_p}
 p. 533 2nd paragraph, first line – M should be m
 p. 538 ex 12.20 – the second equation should have 0.244 ≤ P (i^0,l^1) ≤ (1 − 0.582) = 0.418
 p. 590, example 13.17: MAP assignment is X_1=1, X_2=1, X_3=1, X_4=0

Chapter 13: MAP Inference
 p. 554 first equation after the = max should be argmax
 p. 555 3^{rd} line – max_{b} P(ba^{1}) = 0.9 should be max_{b} P(ba^{0}) = 0.9
 p. 557 Procedure TracebackMAP – first comment ends with eliminated before should be eliminated after
 p. 560 2^{nd} equation first factor after the = should be ϕ_{D}(D)
 p. 566: the traceback procedure for clique trees is essentially the same as in Algorithm 13.1
 p. 564 last line above 13.3.2 – ends with “sumproduct messages.”, should be maxproduct messages.
 p. 573 alg 13.3 – lines 4,5 are redundant (identical to 1,2)
 p. 578 last line of first paragraph after ex 13.15 should end with q(x^{1}_{2}) = 1, q(x^{2}_{2}) = 1 or q(x^{3}_{2}) = 1
 p. 579 first line – example 5.16 should be example 5.15.
 p. 580 first equation – q(x^{j}_{R}) should be q(x^{j}_{r})
 p. 584 end of 3^{rd} equation – )^{ T}] should be )]^{T}
 p. 585 first equation: on second term )^{1/T}]^{T }should be )^{T}]^{1/T}; each a_{j}(t) should be a_{j}(T)
 p. 588 2^{nd} line above 13.6 – max should be argmax
 p. 593 proc Alpha=Expansion line 4 – first parameter to Alphaexpand should be ε
 p. 596 n and m should both be k

Chapter 14: Inference in Hybrid Networks
 p. 611, eq (14.5): g’ = g + 1/2 (log2 \pi K_{YY}^{1} + h_Y^T K_{YY}^{1} h_Y)
 p. 612 lines 9,12 – should be delta_{j>i}; line 10, should be Y = C_j – S_{i,j}
 p. 612 last line – “equation (14.4)” should be equation (14.6)
 p. 616 last equation – rhs has to include a factor p(D1) before the dX1 at the end of the equation
 p. 617, 4th equation: Gaussian has mean 1.5, variance 0.7^2
 p. 625 last equation should be β_2(b^1)
 p. 626 3rd from the last line in first paragraph in 14.3.4 – Y ⊂ Gamma should be X ⊂ Gamma
 p. 628 2nd line, 6th line – “CTreeBUTwoPass” should be CTreeBUCalibrate
 p. 632 first equation – you have “E_p[Z^2] – E_p[Z]^2” should be E_p[X^2] – E_p[X]^2
 p. 634 4th line above last equation – (bZ) should be (Z)
 p. 640 ex 14.18 – 2nd paragraph – 7th line – “linearization” should be “linearize”
 p. 643 6th line – unnecessary period before “q(x) ≥ p(x)”
 p. 643 7 lines from bottom of page – “subtracting E[X]” should be “subtracting E[X]^2”
 p. 644 next to last equation x_i,’’ should be x_i’’,
 p. 659, example 15.3, 2nd paragraph: the induced width is 4, the max clique size is 5.
 p. 660, fig. 15.3: max clique size in (b) is 4 and in (c) is 3.
 p. 685, paragraph 5: for the case k=1, the assignment to A^{(t)},…A^{(t(k2))} is taken to be a unique “vacuous” assignment

Chapter 15: Inference in Temporal Models
 p. 653 last line – “at time t” should be at time t+1
 p. 655 last paragraph above 15.2.3 – 3rd from last line “S(t+1) represents” should be “S(t+1) in the backward pass represents”
 p. 656 2nd paragraph – next to last line  o(1:t) should be o(1:t+1)
 p. 659 figure 15.2 (a) the first clique should be labeled with P(F’F,W)
 p. 660 the B’,C’ clique in 15.3b,c is unnecessary
 p. 663 next to last equation, Z should be Z(t)
 p. 670 alg 15.4 – line 8 should instantiate only the current state to the return value of LW2TBN (not the entire trajectory); line 10 should be unindented
 p. 672, sec. 15.3.3.4 2nd line: offspring are K_m^{(t)}
 p. 674 4th line from bottom of page – q(t1) should be q(t)

Chapter 16: Learning Graphical Models: Overview
 p. 707, Evaluate, line 4: “l/M” > “loss/M”
 p. 707, CrossValidation, line 7: “l/K” > “loss/K”
 P. 710: next to last para, last line P*(Y) > P*(X)
 p. 713: next to last para, “cluster variable” > “class variable”
 p. 714, Figure 16.1 right: there should be an edge from X_3 to Y_2

Chapter 17: Parameter Estimation
 p. 728: last equation “\beta_0” > “M\beta_0
 p. 732: second line of the equation in the proof “\sum m” > “\sum_m”
 p. 732, Eq (17.9) “P(X\theta)” > “P(X: \theta)”
 p. 745, 2nd equation \theta_Y to \theta_{YX} (four times)
 p. 746: Removed subsection 17.4.2.1. The case of treeCPDs is handled in section 17.5.2
 p. 755756 and 760762 : \theta^1 > \theta^k (multiple occurances)
 p. 756, middle paragraph, 4th line: “i \in \V^k” > “X_i \in \V^k”.
 p. 759, Eq (17.20): “b Val” > “b \in Val”
 p. 770, Theorem 17.2: caligraphic M to italic M
 p. 770, last sentence: \tilde{P} > \hat{P}
 p. 775: end of proof of Thm 17.4 “k” > “Pa_i^G”

Chapter 18: Structure Learning in Bayesian Networks
 p. 784: “P(Y = H) = 0.6” > “P(Y = H) = 0.4”
 p. 787, 3rd para: “BuildSkeleton” > “BuildPMAPSkeleton”
 p. 788, last para: “P(x)P(y)” > “\hat{P}(x)\hat{P}(y)”
 p. 289, eq. 182: all counts in both numerator and denominator need to be divided by M
 p. 795, after Eq (18.6): “second term” > “first term” (twice)
 p. 796, above 1st equation: “As we saw in the previous section,” > “Using the chain rule”
 p. 798, 1st equation and following line: M[1]^m > M^m[1], add “=H” on lhs
 p. 800,805,808,809,810,826: Pa_i > Pa_{X_i}
 p. 803: Proof of Thm. 18.2, line starting with “where \Delta = “, I_P > I_{P^*}
 p. 806, last Eq: “x_i, pa_X_i” > “y, x^i”
 p. 809, 2nd para before 18.4.2: “structure equivalence” > “score equivalence”
 p. 810, Pro 18.4: U > U_i in the argument of \arg\max
 p. 811, end of 2nd para: “at d” > “at most d”
 p. 811, 3rd para: (n1 choose 2) > (n1 choose 1)
 p. 818, 1st para: “figure 18.1” > “algorithm 18.1”
 p. 819, end of 2nd para: “involve X_k” > “involve Y”.
 p. 827, 1st equation: “\le k” > “\le d”
 p. 828, 3rd para: “Equation 18.16” > “Proposition 18.7”
 p. 829, 1st eq: G_d > G_k
 p. 830,831,833, figure 18.8, 18.9, 18.10, the three panels should be labeled (a), (b), (c)
 p. 832, 2nd eq, denominator: “T(G > <‘)” > “T(< > <‘)”
 p. 832, 2nd to last para: “figure 18.8” > “figure 18.9”
 p. 839, 3rd to last line: “section 16.A” > “box 16.A”.
 p. 843, end of Ex. 18.3: [2] > [b]

Chapter 19: Partially Observed Data
 p. 852, 4th line: “\theta_X” > “\theta”
 p. 852, last equation: x^0_1 > x^0 (twice)
 p. 854, line 6 from the top:\phi _ {O_x_2  x_2^1} and \phi _ {O_x_2  x_2^0} should be \phi _ {O_x_2  x_1^1} and \phi _ {O_x_2  x_1^0} instead.
 p. 854, 3rd para, 4th line: x^1_2 > x^1_1
 p. 854, display eq: x^1_2 > x^1_1, x^0_2 > x^0_1
 p. 896. 1st eq: lhs – “G(i,j)” > “g(i,j)”
 p. 854, fifth paragraph: \theta_X_1 (1 – \phi _ {O_x_1  x_1^1} ) should be \theta_X_1 (1 – \phi _ {O_x_2  x_1^1} ).
 p. 857, 2nd eq: 10>9, 12>13
 p. 860: kikehood > likelihood
 p. 863, 2nd eq to last: “<E>” > “<O>”
 p. 864, last eq: l(P_\theta:D) > l(\theta : D)
 p. 867, 1st para, 4th line: x[m] > o[m]
 p. 871, 2nd eq and last eq: “e<Y>” > “e[m]<Y>” and “e<C>” > “e[C]<Y>”
 p. 871, middle eq: Q(<a1,c0>) > Q'(<a1,c0>)
 p. 878, 2nd para from end: “and o[1]” > “and o[2]”
 p. 879, last eq: \log Z(\theta) > M\log Z(\theta), A(..) > \log A(…)
 p. 880, 1st eq: A(..) > \log A(…)
 p. 881, end of proof: “implies the first” > “implies the second”
 p. 882, last sentence before Thm 19.5: theorem 19.5 > theorem 19.3
 p. 885, caption of 19.B.1: “training likelihood” > “training loglikelihood”
 p. 886, fig 19.B.2: fixed label of yaxis of (a) to read “# of distict local maxima”, added “(in 25 runs)” to description of (a) in caption)”.
 p. 901, display eq: Pa_i > Pa_{X_i}, Pa_j > Pa_{X_j}
 p. 903, 2nd full para: pa_i > pa_{X_i}
 p. 910, after last eq: \theta_{x_i}u_i > \theta_{x_i  u_i} (twice)
 p. 910, 2nd to last para: “Hessian A” > “Hessian C”
 p. 911, after 1st eq: “p(o[m]\theta)” > “p(o[m]\theta, G)”
 p. 913, 3rd eq: Dim[\theta_G] > Dim[G] (twice)
 p. 917, 2nd para: “bounded degree” > “bounded indegree”
 p. 918, Fig 19_10: Missing list of instances (mentioned as (b) in caption). The numbers mentioned as (c) in the caption appear as labels on the arrows.
 p. 921, 3rd paragraph: “modification G to G0” > “modification o to G”.
 p. 921, 2nd to last eq: score > score_BIC
 p. 922, Alg 19.3: removed extra “G^t” subscript (twice, line 4, 5)
 p. 922, middle first paragraph: should read “we wish to estimate the deltascore: $Score_{BIC}(o(G): D) – Score_{BIC}(G:D)$. The theorem tells us that if $o(G)$ satisfies $Score_{BIC}(o(G) : D^*_{G_0,\theta_0}) > Score_{BIC}(G_0}{D^*_{G_0,\theta_0}}$, then it is
necessarily better than our original graph $G_0$. However, it does not follow that if $\hat{\delta}{D^*_{G_0,\theta_0}}(G:o) > 0$, then $o(G)$ is necessarily better than $G.”  p. 923: “FamScore(X, Y” > “FamScore( X  Y” (four times)
 p. 925, 2nd para: “k+1 or more” > “k+1 or fewer”
 p. 928, 1st eq: “I_Q” > “I_{\hat{Q}_{\theta_0}}”
 p. 929, 3rd para: (19.9) > (19.10)
 p. 929, last sentence: “The compute” > “To compute”
 p. 930, 6th to last line: \sum_k \beta_k > \sum_k \lambda_k
 p. 935, Ex 19.1 (a): p > \theta

Chapter 20: Learning Undirected Models
 p. 945, first equation – ln Z should be ln Z(θ), axes in Fig. 20.1 are ln phi’s, not phi’s.
 p. 946, example 20.1 – figure 3.1c should be figure 3.1a
 p. 947, middle of page – 0 ≥ α ≥ 1 should be 0 ≤ α ≤ 1
 p. 949, next to last paragraph – “optimum of the pseudolikelihood” should be “optimum of the likelihood”
 p. 953, next to last paragraph, 3^{rd} line from the end – you have “activity is running, and then (perhaps) transitions to walking” should be activity is walking, and then (perhaps) transitions to running
 p. 957, middle of page – “From theorem 8.1” should be “from proposition 8.1”
 p. 965 middle of page – “subject to … = 2, the 2 should be 5/3.
 p. 967, last line – lhs – D) should be θ)
 p. 971 f_{i}(x_{j},u_{j}) should be f_{i}(x_{j},x_{j}), f_{i}(x’_{j},u_{j}) should be f_{i}(x’_{j},x_{j})
 p. 972, next to last equation – the summation over x_{j} should be over x_{j}
 p. 973, equation near bottom of page – the RHS needs a multiplier of 1/M; the second summation should be over s, not j
 p. 978, 2^{nd} paragraph, 3^{rd} from last line: “a small margin” should be “a smaller margin”.
 p. 980, 2 paragraphs above ex. 20.4, 3^{rd} line: the probability that at least one of the independence test fails is at most Σ_{k=0}^{d*}(n2 k)ε
 p. 983, eq. (20.28): max should be argmax
 p. 983, 3rd to last eq: sign of the Dim[M]/2 term should be +
 p. 986. Alg 20.1 – line 13 should be indented right under line 12
 p. 990, last 3 equations – the multiplier 1/2β should be 1/β
 p. 991, first equation: subscript L_{1} should be L
 p. 991, first equation – the multiplier 1/2β should be 1/β
 p. 991 LBFGS paragraph: θ_{k} should be θ_{i}, f_{k} should be f_{i}
 p. 992, last line above 20.7.5 – θ should be Θ
 p. 993: the gains computed in the two equations on this page on not bounds on the objective but bounds on the change in the objective
 p. 994, 2^{nd} and 3^{rd }and next to last equations – θ_{k}’ should be θ_{k}
 p. 994, last equation – ends in denominator σ^{2} should be 2σ^{2}
 p. 995, last line above Prop 20.7 – f_{k }should be f_{i}

Chapter 21: Causality
 p. 1016 Figure 21.2 – right hand side – text has “P(c^{1}d^{1})” should be P(c^{1}do(d^{1}))
 p. 1017 Next to last line – “if z = z” should be “if Z = z”
 p. 1018 4^{th} line – middle of line – the subscript G_{do(Z)} should be G_{bar{Z}}
 p. 1019 next to the last line – S and G are interchanged. Should have “new decision parent Ghat for the node G. The node Ghat is dseparated from S on this graph.”
 p. 1024 6th line after the thick black line – should be “the Smoking model in 21.A.2a”
 p. 1029 2^{nd} line after Definition 21.3 – exogeneous should be endogeneous
 p. 1030 Figure 21.5 – “always taker” was accidentally omitted
 p. 1032 2^{nd} equation – rhs – you have v_{(1,1>0,i’)} should be v_{(1,0>1,i’)}
 p. 1036 1^{st} line of text: “never recover” should be “never well”
 p. 1037 Box 21.C – 2 lines above the equation – “low given” should be “high given”.
 p. 1038 In the 3 final equations and the associated text, T = a^{1} should be T = t^{1}
 p. 1038 here “never well” means “always ulcer”
 p. 1039 – 3^{rd} paragraph of 21.7 – 5^{th} line – “mechanism variable” should be “response variable”.
 p. 1041 Example 21.26 – 4^{th} line – I > D should be I > G
 p. 1043 3^{rd} paragraph, 4^{th} line and 1st equation – do(Z := Z) should be do(Z := z)
 p. 1044 Equations in middle of page: 3^{rd} line – do(Y = y^{1}) should be do(X = x^{1}); 4th line – M[x^{1} y^{1}] should be M[x^{1} y^{0}] ; 6th line M[x^{1} y^{1}] should be M[x^{0} y^{0}]
 p. 1051 4^{th} line “parents y” should be “parents Y“
 p. 1051 3^{rd} from last paragraph – 3^{rd} line – “eliminates the set” should be “limits the set”

Chapter 22: Utilities and Decisions
 p. 1064 5^{th} line: U($2,000,000) – U($0) should be U($1,000,000) – U($0)
 p. 1067 2^{nd} complete paragraph – last line –“and if s <” should be “and if s >”
 p. 1070 last equation – the rhs of each inequality should be subscript 2 instead of subscript 1 (twice)
 p. 1073 example 22.9 – last line – “dependent” should be “independent”
 p. 1074 next to last line – “since U =” should be “thus U =”
 p. 1075 Lemma 22.1 – 1^{st} line – P(V) should be P(v)
 p. 1076, Dfn 22.12 – the sets Z_i do not have to cover V
 p. 1079 last line before 22.5 – should be “around $6000”

Chapter 23: Structured Decision Problems
 p. 1084 Figure 23.1 – The order of the branches from Mlabeled variables is inverted: m^{0} is the leftmost branch and m^{2} the rightmost
 p. 1084 1^{st} full paragraph – 2^{nd} line – the an agent tnode is labeled v
 p. 1085 1^{st} full paragraph, 3^{rd} line – founding the company is f^{1}
 p. 1086 Alg. 23.1 – line 11 – the subscript should be succ(v,σ(v)); line 12 should be unindented
 p. 1086 Section 23.2, 1^{st} paragraph, the utility values appear in four subtrees, for the three values of the S variable and one for the case where the survey is not performed
 p. 10871088: We initially used outcomes to refer to joint assignments in Val(Χ ∪ D) but then expanded to also include the (deterministic) utility variables. This is similar trading working with it comes to models where forex or binary options traders have the option to use the techonology as explained in this trading website.
 p. 1092 Example 23.6 right after the equation – δ_{F}(c^{0},s^{0}) should be δ_{F}(c^{1},s^{0}), δ_{F}(c^{0},s^{1}), should be δ_{F}(c^{1},s^{1}), δ_{F}(c^{0},s^{2}), should be δ_{F}(c^{1},s^{2}), and δ_{F} c^{0},S should be δ_{F}(c^{0},S)
 p. 1094 Figure 23.4 – same as Figure 23.1; also, the labels s^{2} and s^{0} are flipped
 p. 1095 Lemma 23.1 – 1^{st} line X_{1},…,X_{i1},D_{1},…D_{ i1} should be X_{1},…,X_{ i1},D_{1},…D_{ i1}
 p. 1096 Example 23.10 – First line after the equation – “we have simplified” should be “we can simplify”
 p. 1101 Example 23.15 – V_{2} should be V_{1} throughout; also in the last line mu_2^1 should be mu_1^2
 p. 1102 2^{nd} bullet – “over factor” should be “over scope”
 p. 1104 – First line of text after first set of equations – “utility over V_{2} “ should be “utility over V_{1}“
 p. 1105 2^{nd} equation in middle of page – δ_{w,d} should be δ_{D}(w)
 p. 1106 Equation 23.10 – the whole equation has to be multiplied by Σ_{VεU}V
 p. 1106 Equation 23.11 – rhs – max should be arg max
 p. 1111 – all occurrences of (u  d,w) should be(v  d,w)
 p. 1111 line after eq. (23.13) – subscript I[σ] should be I[σ’]
 p. 1115 3^{rd} line – Pa(S_{3}) = {S_{2} should be Pa(D_{3}) = {H_{2}
 p. 1115 First full paragraph – 4 lines from end of paragraph – D and D’ should be D_{1} and D_{2}
 p. 1120 Proposition 23.6 – 3^{rd} line – ends with “as a parent of I” should be “as a parent of D”
 p. 1122 Scenario2, 2^{nd} line – we have “P(S_{1}) as above and P(S_{2}) =” should be “P(S_{2}) as above and P(S_{1}) =”
 p. 1128 Exercise 23.1 equation – rhs – max should be arg max
Appendix
 p. 1149 – Eq. A.4 is actually satisfiable. Adding the clause \neg q_1 would make it unsatisfiable