But for sharing files with other people, ZIP is still king. Even 7z or RAR is niche. Everyone can open a ZIP file, and they don't really care if the file is a few MBs bigger.
You can use ZSTD with ZIP files too! It's compression method 93 (see https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT which is the official ZIP file specification).
Which reveals that "everyone can open a ZIP file" is a lie. Sure, everyone can open a ZIP file, as long as that file uses only a limited subset of the ZIP format features. Which is why formats which use ZIP as a base (Java JAR files, OpenDocument files, new Office files) standardize such a subset; but for general-purpose ZIP files, there's no such standard.
(I have encountered such ZIP files in the wild; "unzip" can't decompress them, though p7zip worked for these particular ZIP files.)
Support for which was added in 2020:
> On 15 June 2020, Zstandard was implemented in version 6.3.8 of the zip file format with codec number 93, deprecating the previous codec number of 20 as it was implemented in version 6.3.7, released on 1 June.[36][37]
* https://en.wikipedia.org/wiki/Zstd#Usage
So I'm not sure how widely deployed it would be.
You probably mean the "unzip" command, which https://infozip.sourceforge.net/UnZip.html lists as 6.0 being the latest, released on 20 April 2009. Relevant to this discussion, new in that release are support for 64-bit file sizes, bzip2 compression method, and UTF-8 filenames.
The "zip" command is listed at https://infozip.sourceforge.net/Zip.html as 3.0 being the latest, released on 7 July 2008. New in that release are also support for 64-bit file sizes, bzip2 compression method, and UTF-8 filenames.
It would be great if both (or at least unzip) were updated to also support LZMA/XZ/ZSTD as compression methods, but given that there have been no new releases for over fifteen years, I'm not too hopeful.
Why ? Xz supports xz, zstd supports zstd. Why should unzip support xz or rar or gz ?
if you cant open it, well.. then stop using 90ies winzip
I would also expect people to be able to decode h265 in an mp4 file.
Your proposal seems, to word it bluntly, retarded. You would have mp4 frozen for h264 for ETERNITY, and then invent a new format as replacement? or you would just say "god has bestowed upon the world h264, and it shall be the LAST CODEC EVER!".
get with the program. Things change, you cannot expect to be forwards compatible for ever. Sometimes people have to switch to newer versions of software.
If your customer is stuck in the 90s because his 90s technology works perfectly fine and he has no intention to fix things that are not broken. Then deliver stuff that is compatible with 90s technology. He will be happy, will continue to work with you and you will make money.
If your customer is using the latest technologies and values size efficiency, then use the latest codecs.
I usually default to being conservative, because those who are up to date usually don't have a problem with bigger files, but those who are not are going to have a problem with recent formats. Maybe overly so, but that's my experience with working with big companies with decades long lifecycles.
Your job is not to lecture your customer, unless he asked for it. And if he asked for it, he probably expects better arguments that "update your software, idiot". Your job is to deliver what works for him. Now, of course, it is your right to be picky and leave money on the table, I will be happy to go after you and take it.
Professionally I can definitely support old stuff. It costs extra most often.
Conservative doesnt have to be stuck. I am not recommending we send h266 to everyone now, but h265 is well supported, as is AV1.
lzma support in zip has been widely supported for many years at this point. I am going to be choosing my "sane defaults", and if someone has a problem with that, they can simply do what they need to do to open it, or provide a damn good reason for me to go out of my way.
How about software developers learn to keep software working on old OSes and old hardware?
Installing new software has a real time and hassle cost, and how much time are you actually saving over the long run? It depends on your usage patterns.
Right, and when were these "final updates" made? are you suggesting 95, 98 still sees ACTIVE security support?
I know what you mean, I’m not being pedantic, but I just realized it’s been 19 years. I wonder when we’ll start calling them “Office files”.
Probably around the same time the save icon becomes something other than a 3 1/2" floppy disk.
In many (most?) cases, it's possible to get better compression and higher quality if you're willing to spend the CPU cycles on it, meaning that YouTube could both reduce their encoding load and increase quality at the same time, and content creators could put out better quality videos that maintain better detail.
It would certainly take longer to upload the multiple multiple versions of everything, and definitely it would take longer to encode, but it would also ease YouTube's burden and produce a better result.
Ah well, a guy can dream.
So you could upload a crazy high bitrate file to them for a 20 min video which I suspect would be close to "raw" quality.
I don't know how many corners youtube cut on encoding though.
I suspect most of the problem is people exporting 4k at a 'web' bitrate preset (15mbit/s?), which is actually gonna get murdered on the 2nd encode more than encoding quality on youtubes side?
Mostly it seems nutty that, after all these years, they’re still updating the zip spec instead of moving on to a newer format.
Some things are used for interoperability, and switching to a newer incompatible thing loses all of its value.
That makes them useful for transferring an entire set of files that someone will want all or none of, e.g. source code, but terrible for a set of files that someone might want to access arbitrary files from.
They are not regarded kindly.
You're assuming things because things are already done insecurely. You can authenticate the self-extractor as well as the extracted content. The user gets a nice message "This is a 7zip self-extracting archive sent to you by Bob containing the files below".
As an incident responder, I've seen much more of regular archives being used to social engineer users than self-extracting archives, because self-extracting is not "content executing". it is better for social engineering for users to establish trust in the payload first by having them manually open the archive. if something "weird" like self-extraction happens first, it might feel less trustworthy.
Oh and by the way, things like PyInstaller or electron apps are already self-extracting and self-executing archives. So are JAR files and android APK's.
however, once extracted, jar files do contain executable code, and that is a security issue. the java model pays attention to security, but if code can do something, it can do something bad. if it can't do something, it's not very useful, is it.
People also tend to care about how much time they spend on compression for each incremental % of compression performance and zstd tends to be a Pareto frontier for that (at least for open source algorithms)
Unfortunately for the hoster, they either have to eat the cost of the added bandwidth from a larger file or have people complain about slow decompression.
My use cases are usually source code, SQL dumps and log files.
Sometimes xz gave marginally better results, but difference was well below 1%
raw size: 9612344 B
zstd --ultra -22 --long=31 => 376181 B (3.91% original, 4.088s compress, 0.013s decompress)
xz -z -9 xml => 353700 B (3.68% original, 0.729s compress, 0.032s decompress)
zstd -17 --long=31 could match the compression time of xz, but the size is bigger (405602 B, 4.22% original)
If you compare only the compressed size (not to the original size), .zst would be about 6-15% larger than .xz
https://www.hydrogen18.com/blog/apk-the-strangest-format.htm...
I was running "zstd --ultra --threads=0" which I assumed was asking it for the absolute maximum
I redid your experiments with rust-wasm-1.83.0-r0.apk:
size perc c.time d.time
uncompressed: 290072064 - -
gzipped original: 105255109 36.29% -
bzip2 -9: 107099379 36.92% 21.1s 11.0s
bzip3 -b511: 73539847 25.35% 28.9s 32.0s
xz --extreme -9: 71010672 24.48% 142.0s 3.1s
lzip -9: 70964413 24.46% 173.5s 5.3s
zstd --ultra -22: 48288499 16.64% 155.6s 0.4s
It's pretty clear zstd blows everything else out of the water by a huge margin. And even though compressing with zstd is slightly slower than xz in this case (by less than 10%), decompression is nearly 8x as fast, and you can probably tweak the compression level to make zstd be both faster and better than xz. uncompressed: 1512662084
xz --extreme -9: 508431572 12:47
zstd --ultra -21: 508432560 12:44
(-22 ran out of memory.) So at least by me zstd was identical to xz almost to the byte and the second.If the email data is mostly text with markup (like HTML/XML), you might want to try bzip3 too.
It's also possible that a large part of your email is actually already-compressed binary data (like PDFs and images) possibly encoded in base-64. In that case it's likely that all tools are pretty good at compressing the text and headers, but can do little to compress the attachments, which would explain why the results you get are so close.
bzip3 -b511: 580771424 8:51
I suspect your theory about compressed attachments is correct, although bzip3 isn't doing very well compared to the rest.Overall I'm still slightly biased towards using zstd as a default, in that I believe:
1. zstd will almost always be among fastest formats for decompression, which is obviously nice-to-have everything else being equal.
2. zstd can achieve a very high compression ratio, depending on tuning; rarely will zstd significantly underperform the next best option.
Overall this is a pretty good case for using zstd by default, even if in some cases it's not noticably better than other formats. In your case, xz seems to be just as good.It's not going anywhere anytime soon.
The more likely thing to eat into its relevance is now that Windows has built-in basic support for zipping/unzipping EDIT: other formats*, which relegates 7-zip to more niche uses.
Agreed. The only thing I think it has been missing is PAR support. I think they should consider incorporating one of the par2cmdline forks and porting that code to Windows as well so that it has recovery options similar to WinRAR. It's not used by everyone but that should deprecate any use cases for WinRAR in my opinion.
That allows it to be a default that 'just works' for most people without installing anything extra.
The vast majority of users don't care about the extra performance or functionality of a tool like 7-zip. They just need a way to open and send files and the Windows built-in tool is 'good enough' for them.
I agree that 7-zip is better, but most users simply do not care.
That is enough to bite into 7-Zip's share of users.
That makes it 'good enough' for the vast majority of people, even if it's not as fast or fully-featured as 7-Zip.
On Linux bsdtar/libarchive gives a similar experience: "tar xf file" works on most things.
Hence why PeaZip is so popular, and J-Zip used to be before it was stuffed with adware.
Desktop software usability peaked sometime in the late 90s, early 2000s. There's a reason why 7zip still looks like ~2004
Instead you chose to make a useless snarky comment. Be better.
Granted Windows 11 has started doing the same for its zip and 7zip compressors.
Same trick goes for opening archives or executables (Installers) as archives.
Since you're asking, the answer is no. 7-Zip has an efficient and elegant UI.
The author updates code in the github repo....by drag and drop file uploads. https://github.com/peazip/PeaZip/commits/sources/
I don't know what your use case is, but it seems to be quite a niche.
It includes the above patches as well as few QoL features.
Even on latest Windows 11 takes minutes to do what 7-Zip does in seconds.
Goes to show how good all those leetcode interviews turn out.
on anything else - either directly zstd or tar
Because of that, transitioning a software thread to another processor group is a manual process that has to be managed by user space.
The technical decision Microsoft made initially worked well for over two decades. I don’t think it was lame; I believe it was a solid choice back then.
Server systems were available with that since at least the late 90s. Server systems with >10 CPUs were already available in the mid-90s. By the early-to-mid 90s it was pretty obvious that was only going to increase and that the 64-CPU limit was going to be a problem down the line.
That said, development of NT started in 1988, and it may have been less obvious then.
(Now, NT for Sparc never actually became a thing, but it was certainly on Microsoft's radar at one point)
Though MS ported NT to a number of systems (mips, alpha, ppc) it wasn’t able to play in the very big leagues until later.
I agree it was a reasonable choice at the time. Few were getting mileage out of that many CPUs back then.
And x86 arguably didn't ship >64 hardware thread systems until then because NT didn't support it.
If that were the case the above system wouldn't have needed 8 sockets. With NUMA systems the app needs to be scheduling group aware anyways. The difference here really appears when you have a single socket with more than 64 hardware threads, which took until ~2019 for x86.
The only difference with Windows is a single processor group cannot contain more than 64 cores. This is why 7-Zip needed to add processor group support - even though a 96 core Threadripper represents as a single NUMA node the software has to request assignment to 2x48 processor groups, the same as if it were 2 NUMA nodes with 48 cores each, because of the KAFFINITY limitation.
Examples of common NUMA aware Linux applications are SAP Hana and Oracle RDBMS. On multi-socket systems it can often be helpful to run postgres and such via https://linux.die.net/man/8/numactl too, even if you're not quite the scale you need full NUMA awareness in the DB. You generally also want hypervisors to pass the correct NUMA topologies to guests as well. E.g. if you have a KVM guest with 80 cores assigned on a 2x64 Epyc host setup then you want to set the guest topology to something like 2x40 cores or it'll run like crap because the guest is sees it can schedule one way but reality is another.
I absolutely stand by the fact that Intel and AMD didn't pursue high core count systems until that point because they were so focused on single core perf, in part because Windows didn't support high core counts. The end of Denmark scing forced their hand and Microsoft's processor group hack.
Single core performance is really important for client computing.
5.0 (1999) - NUMA scheduling
6.1 (2009) - Processor Groups to have the KAFFINITY limit be per NUMA node
Xeon E7-8800 (2011) - An x86 system exceeding 64 total cores is possible (10x8 -> requires Processor Groups)
Epyc 9004 (2022) - KAFFINITY has created an artificial limit for x86 where you need to split groups more granular than NUMA
If x86 had actually hit a KAFFINITY wall then the E7-8800 even would have occured years before processor groups because >8 core CPUs are desirable regardless if you can stick 8 in a single box.
The story is really a bit reverse from the claim: NT in the 90s supported architectures which could scale past the KAFFINITY limit. NT in the late 2000s supported scaling x86 but it wouldn't have mattered until the 2010s. Ultimately KAFFINITY wasn't an annoyance until the 2020s.
Windows didn’t run on these other systems, why would Microsoft care about them?
> x86 arguably didn't ship >64 hardware thread systems until then because NT didn't support it
For publicly accessible web servers, Linux overtook Windows around 2005. Then in 2006 Amazon launched EC2, and the industry started that massive transition to the clouds. Linux is better suited for clouds, due to OS licensing and other reasons.
Because it was clear that high core count, single system image platforms were a viable server architecture, and NT was vying for the entire server space, intending to kill off the vendor Unices.
. For publicly accessible web servers, Linux overtook Windows around 2005. Then in 2006 Amazon launched EC2, and the industry started that massive transition to the clouds. Linux is better suited for clouds, due to OS licensing and other reasons.
Linux wasn't the only OS. Solaris and AIX were NT's competitors too back then, and supported higher core counts.
That doesn't mean every platform was or would have been profitable. x86 became 'good enough' to run your mail or web server, it doomed other architectures (and commonly OSes) as the cost of x86 was vastly lower than the Alphas, PowerPCs, and so on.
> Applications that do not call any functions that use processor affinity masks or processor numbers will operate correctly on all systems, regardless of the number of processors.
I suspect the limitation 7zip encountered was in how it checked how many logical processors a system has, to determine how many threads to spawn. GetActiveProcessorCount can tell you how many logical processors are on the system if you pass ALL_PROCESSOR_GROUPS, but that API was only added in Windows 7 (that said, that was more than 15 years ago, they probably could've found a moment to add and test a conditional call to it).
"If there are more than one processor group in Windows (on systems with more than 64 cpu threads), 7-Zip distributes running CPU threads across different processor groups."
The OS does not do that for you under Windows. Other OSs handle that many cores differently.
> more than 15 years ago, they probably could've found a moment to add and test a conditional call to it
I suspect it hasn't been an issue much at all until recently. Any single block of data worth spinning up that many threads for compressing is going to be very large, you don't want to split something into too small chunks for compression or you lose some benefit of the dynamic compression dictionary (sharing that between threads would add a lot of inter-thread coordination work, killing any performance gain even if the threads are running local enough on the CPU to share cache). Compression is not an inherently parallelizable task, at least not “embarrassingly” so like some processes.
Even when you do have something to compress that would benefit for more than 64 separate tasks in theory, unless it is all in RAM (or on an incredibly quick & low latency drive/array) the process is likely to be IO starved long before it is compute starved, when you have that much compute resource to hand.
Recent improvements in storage options & CPUs (and the bandwidth between them) have presumably pushed the occurrences of this being worthwhile (outside of artificial tests) from “practically zero” to “near zero, but it happens”, hence the change has been made.
Note that two or more 7-zip instances working on different data could always use more than 64 threads between them, if enough cores to make that useful were available.
The referenced text suggests applications will "work", but that isn't really explicit.
> starting with Windows 11 and Windows Server 2022 the OS has changed to make processes and their threads span all processors in the system, across all processor groups, by default.
> Each process is assigned a primary group at creation, and by default all of its threads' primary group is the same. Each thread's ideal processor is in the thread's primary group, so threads will preferentially be scheduled to processors on their primary group, but they are able to be scheduled to processors on any other group.
The difference is just that processes will be assigned a processor group more or less randomly by default, so they'll be balanced on the process level, but not the thread level. Not super helpful for a lot of software systems on windows which had historically preferred threads to processes for concurrency.
This explicitly says the feature is automatic and programs will not need to manually adjust their affinity.
No it won't.
That's literally why 7-zip is announcing completion of that manual work.
It also needed to change if you want optimal scheduling, and it needed to change if you want it to be able to use all those cores on something that isn't windows 11.
But for just the basic functionality of using all the cores: >Starting with Windows 11 and Windows Server 2022, on a system with more than 64 processors, process and thread affinities span all processors in the system, across all processor groups, by default
That's documentation for a single process messing with its affinity. They're not writing that because they wrote a function to put different processes on different groups. A single process will span groups by default.
Modern AMD CPUs are literally consist of core groups on chiplets. It is better for an OS to make decisions / expose APIs for cores that are physically so far away from each other that moving data back-and-forth over the RAM, system bus or interconnect has significant time penalties.
An ugly limitation on an API that initially looks superior to Linux equivalents.
TortoiseGit (and TortoiseSVN) are similarly convenient. Right click a folder with an SVN repo checked out, and select "SVN update". Right-click an empty space, and select "SVN checkout". SVN was the main distribution method for some modding communities before things like Steam Workshop and Github, specifically because TortoiseSVN made it so convenient. Checkout into your addons folder, and periodically update. What could be simpler?
You just termux qemu-utils convert your qcow2 partitions to IMG and 7zip can read IMG file
Try yourself to see you can extract from your emulated windows
I guess they buffer the compressed stream to RAM before writing to zip. If they want to keep their zip stable (always the same output given the same input), they also need to keep it a bit longer than necessary in RAM.
Posting this link to hn has consumed more human potential than the thing it is describing will save up to the end of time.
The rest of this comment has, though gratuitously snarky, a point, but I don’t think claiming that 7zip is irrelevant as an independent statement is even remotely coherent.
Betamax was better, too.
Half the internet is tar.xz. tar.zstd will probably replace that.
It was smaller, had better picture AND audio quality, and a better head to tape ratio.
It was also proprietary and stored less, which were 2 causes of it losing to VHS.
.zstd may END UP being a standard, or it may not, but sheer technical merit is not enough to guarantee it.
Betamax was updated repeatedly, as was VHS. At no point was contemporaneous consumer Betamax able to match VHS for quality with the same recording times.
This is a persistent myth propagated primarily by people who have probably never seen Betamax in use.
Often people compare a professional version of betamax to consumer versions of VHS as well.