3 pointsby sebg3 hours ago1 comment
  • billconan2 hours ago
    I do not understand.

    how is this different from building smaller transformer layers, and each layer just denoises less?