I need to transport multiple very large files over an unstable and untrusted network, and the file contents are outputted as a data stream. I wanted to use OpenSSL for streaming authenticated encryption, but they purposefully don’t support that and are preachy about it.

Well, it turns out that XZ has checksumming built-in! It even has different algorithms (CRC32, CRC64, and SHA256). It’s part of the same file, within/before the encryption, and automatically verified by the decompression tool. I’m already using XZ for compression before encryption, so this is just super convenient and useful. Also, it seems like XZ supports threaded decompression now, when it didn’t before. Thanks XZ devs!

  • ReversalHatchery@beehaw.org
    link
    fedilink
    arrow-up
    14
    ·
    edit-2
    1 year ago

    untrusted network

    What stops the network operator from modifying the data and the checksum? Do you transfer the checksum out of band?

    • 𝒍𝒆𝒎𝒂𝒏𝒏@lemmy.one
      link
      fedilink
      arrow-up
      9
      ·
      1 year ago

      Wouldn’t be possible afaict, the encryption masks the xz archive which contains the checksum metadata. If the data is modified, decryption & extraction will simply fail.

      The data will be undecipherable to a mitm anyway since it’s encrypted, the only real risk imo would be someone modifying the encrypted data in transit to attempt a zero day targeting the decryption process… chances of which are probably really low lol

    • JuxtaposedJaguar@lemmy.mlOP
      link
      fedilink
      arrow-up
      7
      ·
      edit-2
      1 year ago

      I’m not a cryptographer (so maybe this is wrong), but my understanding is that although it’s possible to modify the cipher text, how those changes modify the plaintext are very difficult (or impossible) to predict. That can still be an attack vector if the attacker knows the structure of the plaintext (or just want to break something), but since the checksum is also encrypted, the chances that both the original file and checksum could be kept consistent after cipher text modification is basically zero.

      • version_unsorted@lemmy.ml
        link
        fedilink
        arrow-up
        3
        ·
        1 year ago

        A checksum and a digital signature aren’t the same thing. If you have a data block and a checksum of the data block, the data block can be modified and a new checksum can be computed to reflect the modifications. Instead of a checksum would be a digital signature using an asymmetric key. The data block would be modified but the signature of that block can’t be recomputed without the key used to sign it, which is not part of the transfer.

        • JuxtaposedJaguar@lemmy.mlOP
          link
          fedilink
          arrow-up
          2
          ·
          1 year ago

          The data block would be modified but the signature of that block can’t be recomputed without the key used to sign it

          Isn’t that also true of an encrypted checksum, though? For some plaintext block q there is a checksum r, but the attacker can only see and modify the encrypted q (Q) and encrypted r (R). How any change to Q would modify q (and R to r) can’t be known without knowing the encryption key, but the attacker would need to know that in order to keep q and r consistent.

          • version_unsorted@lemmy.ml
            link
            fedilink
            arrow-up
            3
            ·
            edit-2
            1 year ago

            Possibly the source of any confusion here is when the encryption and when the compression takes place? Maybe some more details about how you are using xz and encryption would help.

            As far as I can tell, xz doesn’t do anything with signatures or encryption, but it does perform checksums like you stated, which is very cool and I’m glad you shared this.

            Edit: I am re-reading your post above. You are compressing with xz, then encrypting, got it. So yes, if any part of the payload is tampered with, then it would be detected by the decryption, depending on the algorithm, or by the decompression because of the checksums like you said. Sorry for the confusion! You’ve got it all straight lol.