4 pointsby Brajeshwar11 hours ago1 comment
  • jll2910 hours ago
    Careful:

    - "entropy" is information; "information", therefore, already is surprise; thus, it's dangerous to re-define "surprise" as -log P(x), which is already part of the definition of suprise, as that leads to ambiguity and a circularity;

    - KL divergence is relative entropy (added surprise by a second distribution, given a first, so _relative_ surprise);

    - I would caution about terms like "expected surprise" for the same reason as I object to "dry water"...

    • pizza3 hours ago
      OP is correct; surprisal is outcome-dependent and entropy is distribution-dependent

      - entropy is E_p[informativeness of measuring outcome x]

      - take n outcomes, then a distribution over them lives on the simplex \delta ^ (n - 1). you can lift this to R^n via the log odds map p_k -> x_k = log p_k -- now x \in R^n can describe a histogram with n-1 degrees of freedom

      - in log odds space, measurement is literally a linear functional from vector space of log probability onto the index of the outcome k.

      - imo surprisal of some p(x) is best understood as "the length of a pointer", entropy "the rarity-weighted average length of a pointer", and collision entropy "how specific you would have to be to describe witnessing a specific outcome"

      and in the same way, a single molecule of water, you might get by, calling dry

    • nchagnet7 hours ago
      Hi, author here! Thanks for the feedback, as I mentioned this is also to clarify things for myself so this helps a lot.

      Regarding your points:

      - I'm not sure I get your meaning here. My understanding is that for a random variable X, thr surprise is defined at the outcome level I(x) = - log p(x) while the entropy is essentially just the average value - sum_x p(x) log(p(x)). So to me it does look like entropy is expected surprise no? I do agree though that by being _expected_ surprise, entropy is itself a measure of surprise.

      - I very much agree with that which is why I used _excess_ surprise (maybe relative is a better choice, but the intent is the same).

      - That one I'm also confused about. It gets back to my first point: to me surprise (or information) is always defined at the outcome level first, so taking a moment is not tautological, it's meaningful, no?