That is incorrect, Windows never adopted the LP64 model. Only pointers were increased to 64-bit whereas long remained 32-bit. The long datatype should be avoided in cross-platform code.
uint64_t is a bit verbose, many re-def this to u64.
I always understood the native types to be the "probably most efficient" choice, for when you don't actually care about the width. For example, you'd choose int for a loop index variable which is unlikely to hit width constraints because it's the "probably most efficient" choice. If you're forced to choose a width, you might choose a width that is less efficient for the architecture.
Is that understanding correct? Historically or currently?
Either way, I think I now agree that unspecified widths are an anti-feature. There's value in having explicitly specified limits on loop index variables. When you write "for(int32_t i; ...)", it causes you to think a bit, "hey can this overflow?" And now your overflow analysis will be true for all arches, because you thought about the width that is actually in use (32-bits, in this case). It keeps program behavior consistent & easier to reason over, for all arches.
That's my thinking, but I'd be interested to hear other perspectives.
Remember that the C standard came about many years after the language was already in use, the C abstract machine wasn't an explicitly design for portability and performance, it was documenting an existing system.
C compilers being performant and portable is partly due to luck but mostly due to hard work by very smart people.
Last time I looked, clangs analysis and optimizing code was more than a quarter of a million lines as an example.
C being imperative is probably a lens for understanding how the type of optimization you are talking about are opportunistic and not intrinsic.
Another lens is to consider that the PDP11 had flat memory, but NUMA, l2 and l3 caches and deep pipelines make the compiler far more complicated despite maintaining that simple model on the abstract machine.
Ironically, FORTRAN, which was written on IBM machines that had decrementing index registers.
While the base one indexing is explained as being simply a choice of lowest value. In the historic context is is better conceptualized as a limit index.
That more closely matches what you are describing above. If you look at the most recent CPP versions adding ranges, that is closer to both FORTRAN and the above IMHO.
https://en.cppreference.com/w/cpp/ranges
That history is complicated because Dennis Ritchie's work on college was on what he called 'loop programming', what we would call the structured paradigm today.
That does have the concept that any loop that you know the number of iterations will always halt, but being imperative, C doesn't really enforce that although any individual compiler may.
C compilers are u reasonably effective in optimization, but that is in spite of the limits of the C abstract machine, not because of it .
As shown above, all it takes is one powerful actor like MS making a decision, that probably was justified at the time, to introduce side effects across all platforms.
Often it is safe to assume that the compiler will make good decisions, other times you have to force it to make the correct decision.
But using the default types is a decision more of a choice to value portability than about performance IMHO.
for loop
Executes a loop.
Used as a shorter equivalent of while loop.
https://en.cppreference.com/w/c/language/forI am amazed at how good compilers are today.
There is also the difference between portability, where it means it compiles vs meaning that the precision and behavior is similar across platforms.
long would be more portable for a successful compilation but may cause side effects.
I shouldn't have switched meanings in the above reply context.
This itself is a platform-specific property, and is thus non-portable (not in the sense that your code won't run, but in the sense that it might be worse for performance than just using a known small integer when you can).
Both of int32_t on Windows and int64_t on Unixes can't be the "probably most efficient" choice on the same machine.
Besides, struct bloating is a perfectly fine C optimization that your compile can do at any time to get the most efficient implementation without that "probably" part. It almost never does, tough, because it's a shitty operation and because CPUs that handle 64 bits perfectly but fumble around with 32 bits are a historic oddity only.
In another 20 years when we have 128bit PCs, it will be comforting to know that we’ll still be hamstrung on 32bit integers because of a design choice made in the 1990s.
You can go look up how the 32-bit protected mode got hacked on top of the 16-big segmented virtual memory that the 286 introduced. The Global Descriptor Table is still with us on 64-bit long mode.
So, its not PAE that is particularly hacky, its a more broader thing with x86.
In x86-64 long mode and i386 32-bit mode, pointers are really 64- and 32-bit, respectively; I would not call this a hack.
x86 and its history is full of things that look hacky, and might be, but are often there for backward compatibility. If your x86 PC still boots in BIOS mode, it comes up in 16bit real mode [2], ready to run DOS. It then moves through the decades into protected mode and lastly (for x64 systems) long mode.
[1] https://learn.microsoft.com/en-us/windows/win32/memory/addre... [2] https://wiki.osdev.org/Real_Mode
Sure, that doesn't change pointer sizes, but it would have reduced the impact of the different 64-bit data models, like Unix LP64 vs Windows LLP64
(1) DX: typing "int" feels more natural and less clunky than choosing some arbitrary size.
(2) Perf: if you don't care about the size, you might as well use the native size, which is supposed to be faster.
In Java, people do use the hardware-independent 4 byte ints and 8 byte longs. I guess (1) matters more, or that people think that the JVM will figure out the perf issue and that it'll be possible to micro-optimize if a profile pointed out an issue.
You always care about the size (or should), especially if you're writing C or C++. Though it is often reasonable that 32767 is a sufficient limit and you're guaranteed at least that with int.
Of course you need to think about it. In C, but also in many languages (not python though, which magically switch to bigint when needed). In Java, the wrong int type won't cause UB, but it will throw (unchecked) exceptions.
If you care about this, you figure out exactly how much you need and always use the smallest type that meets this criteria.
There have been architectures in the past where the "native" size was in practice faster than the smaller types, but those architectures are now long dead. On all modern architectures, none of the instructions for smaller data types are ever slower than the native size, and while using them doesn't directly win you cycles of execution time in the cpu (because they are no faster either), it wins you better cache utilization. As a rule, the fastest data type is a byte.
There is no reason to ever use "int", other than inertia.
> There is no reason to ever use "int", other than inertia.
...and DX, as I said.
I don't think this is a reasonable take. Beyond ABI requirements and how developers use int over short, there are indeed requirements where the size of an integer value matters a lot, specially as this has a direct impact on data size and vectorization. To frame your analysis, I would recommend you took a peek at the not-so-recent push for hardware support for IEEE754 half-precision float/float16 types.
I don't see the relation to fp16; I don't think anyone is pushing for `float` to refer to fp16 (or fp64 for that matter) anywhere. `long double` is already bad enough.
I think you got it backwards. There are platform-specific ints because different processors have different word sizes. Programing languages then adjust their definitions for these word sizes because they are handled naturally by specific processors.
So differences in word sizes exist between processors. Either programming languages support them, or they don't. Also, there is also specific needs to handle specific int sizes regardless of cpu architecture. Either programming languages support them, or don't.
And you end with "platform-specific integer widths" because programming languages do the right thing and support them.
Furthermore I argue that word size is not really something that makes sense to even expose at language level, the whole concept of word size is somewhat questionable. CPUs operate on all sort of things that can have different sizes, trying to reduce that to single "word size" is futile.
However, it was also conventional wisdom to use int by default to match the architecture’s “natural” word size, and maybe add a preprocessor check when you needed it to be 32-bit.
Another consideration is that the built-in types have to be used with the C standard library functions to some extent.
Unfortunately I've read articles where quite-more-respected-than-me people said in a nutshell “no, x32 does not make a difference”, which is contrary to my experience, but I could only provide numbers where the reply was “that's your numbers in your case, not mine”.
Amazon Linux kernel did not support x32 calls the last time I tried, so you can't provide images for more compact lambdas.
Back then when the VPSes you could rent had 256MB of RAM or some times even 128MB, it was common knowledge that using a 32 bits distro would have a huge impact on your memory usage.
Maybe you are reading those opinions wrong, and what they are really saying is "it's not pointers filling 4GB of RAM, the pointer size makes no difference on modern machines"? Because I can agree with that one.
It was a far hotter topic back when.
After so many years, the only article I could locate (but unfortunately not one I did comment on) that I remember is this one:
https://flameeyes.blog/2012/06/19/is-x32-for-me-short-answer...
There are other commenters though that mention the cache pressure and performance difference.
Then again, I also thought segmentation/segment registers might be useful for bounds checking, as in Multics and in the original version of Google Native Client.
A practical use could be e.g. using bit fields can be convenient, e.g. having 32-bit indexes, with the higher bit for the color in a Red-black tree. And in case of requiring dynamic-sized items in the tree nodes, these could be in different 32-bit addressable memory pools.
ooh, found a link to a UNIX Open Group white paper on that discussion and reasoning why LP64 should be/was chosen:
https://unix.org/version2/whatsnew/lp64_wp.html
And per Raymond Chen, why Windows picked LLP64: https://devblogs.microsoft.com/oldnewthing/20050131-00/?p=36... and https://web.archive.org/web/20060618233104/http://msdn.micro...
For some history of why ILP32 was picked for 1970s 16 to 32 bit transition of C + Unix System V (Windows 3.1, Mac OS were LP32) see John Mashey's 2006 ACM piece, partcularly the section "Early Days" sechttps://queue.acm.org/detail.cfm?id=1165766
No peanut gallery comments from OS/400 guys about 128-bit pointers/object handles/single store address space in the mid-1990s please! That's not the same thing and you know it! (j/k. i'll stop now)
> PDP-11s still employed (efficient) 16-bit int most of the time, but could use 32-bit long as needed. The 32-bitters used 32-bit int most of the time, which was more efficient, but could express 16-bit via short. Data structures used to communicate among machines avoided int.
Oh, interesting. So "short" meant 16-bit portable integer, "long" meant 32-bit portable integer, and "int" meant fast non-portable integer.
Just like the instruction pointer which implicitly increments as code executes, there are some dedicated data-pointer registers. There's a dedicated ALU for advancing/incrementing, so you can have interesting access patterns for your data.
Rather than loops needing to load data, compute, store data, and loop, you can just compute and loop. The SSRs give the cores a DSP like level of performance. So so so neat. Ship it!
(Also, what was the name of the x86 architecture some linux distros were shipp8ng with 32 bit instructions & address space, but using the new x86-64 registers?)
One lesson is to check and not allow i.e. not ignore the address bit, like DEC (but where is DEC now btw). Tbh, look at C. How many features are not allowed but just undefined in that standard. Hence even that I wonder is there some reason we have to accept human fallibility and deal with it.
Anyway easy to comment on hindsight. What I think more important is say it cannot be done and hence no way forward but totally new architecture. X86 cannot do 64 bit say … it ended up confusing me why we have amd-64 in an Intel cpu …
What we need is like Apple somehow force migration except Apple II, we have from Mac OS … 9 … x … macOS with hardware change …
IBM mainframe is still running and the 24 bit is a feature not a bug or mistake from a marketing point of view.
Alas, my SmartOS test system is gone, or I would show you.
smartos$ uname -a
SunOS smartos 5.11 joyent_20240701T205528Z i86pc i386 i86pc Solaris
Core system stuff: smartos$ file /usr/bin/ls
/usr/bin/ls: ELF 32-bit LSB executable, Intel 80386, version 1 (Solaris), dynamically linked, interpreter /usr/lib/ld.so.1, not stripped
smartos$ file /bin/sh
/bin/sh: symbolic link to ksh93
smartos$ file /bin/ksh93
/bin/ksh93: ELF 64-bit LSB executable, x86-64, version 1 (Solaris), dynamically linked, interpreter /usr/lib/amd64/ld.so.1, not stripped
And then the pkgsrc stuff: smartos$ which ls
/opt/local/bin/ls
smartos$ file /opt/local/bin/ls
/opt/local/bin/ls: symbolic link to /opt/local/bin/gls
smartos$ file /opt/local/bin/gls
/opt/local/bin/gls: ELF 64-bit LSB executable, x86-64, version 1 (Solaris), dynamically linked, interpreter /usr/lib/amd64/ld.so.1, not stripped
$ uname -a
SunOS bob 5.11 11.4.68.164.2 sun4v sparc sun4v kernel-zone
$ pwd
/usr/bin
$ file * | grep 32-bit | wc
46 1008 8890
$ file * | grep 64-bit | wc
1036 21449 185122
I think Solaris is famous for doing away with static linking.E: A Solaris 10 amd64 box is all 32-bit in /usr/bin.
There's endless actual pictures of processors from both eras. Using actual images here would have been as fast, possibly faster, than the process of writing a prompt and generating this image.
Thats the baby mode stuff dude. You have missed the forest for the trees if the only thing that you can engage with, is the simplest, most low effort usecase for AI.
There are so many other things that can be done with AI. The immediate, obvious stuff would be anything to do with Video To Video AI generation.
IE, imagine someone shoots a video the normal way. With all the creative input that this involves. And then you take that video, and you change things in it, using AI.
I don't know, you see the video and you realize that you need to add an additional lightsource in, and you want all the shadows to autocorrect. You could use AI to do that.
Thats just a random/intermediate usecase off the top of my head, that involves a lot more creative input than just "prompt in, video out".
I am sure that there could be a lot more crazy usecases, but you aren't going to be able to see then, because you are instinctively losing your mind by only talking about the dumbest and most easy usecase of prompt engineering.
When all you have is a hammer...
I see someone else commented that it's probably due to copywrite/licensing. I agree there too. That's a shame. So, because of usage policies we end up with AI generated pictures that aren't real, aren't accurate and usually off-putting in some way. Great.
Click "tools" then "usage rights", pick "creative commons", pick an image.
Now search for "core cpu" and pick a second image.
Yeah that sure was hard and time consuming!
I assume PNG/etc image file formats have internal tag-like data structures.
Browsers could display the image as usual, and show the "origin" tag data alongside HTML alt tag data.
Of course people could null out, or overwrite the origin data, but this seems like a reasonable default practice.
I must be getting old.