Also on a personal note, even though I know every comment I make is public and indexed etc. etc. I find this kind of creepy. I don't like being part of an AI dataset.
This is understandable, but I'm sure all the HN comments have been a part of training dataset for many chatbots now. In fact, this is a gold mine of sane and valuable sanctuary of comments, so this must have been definitely helpful.
I think the "ick" factor for me comes from the feeling that social engagement shouldn't really be queryable. When I participate here, it's an in-the-moment thing. While I realize my opinions are stored forever and searchable, and I generally stand by most of what I say, I think making meta-products around social engagement changes the flavor and the feeling of how we interact. It's like when someone points a camera at you. Sure, it doesn't really change anything, but also, it completely changes things right?
Be that as it may, I don’t think “everyone does it” is an excuse. An absurdly high number of people throw trash on the floor. I actively pick it up or at a minimum don’t contribute to the problem.
The answer to “many companies are unethically gathering your data” is not “it’s OK for me to be unethical too”.
I think the real thing happening here is the realization that anything you say on the public internet can be used against you - and that concerns you. This is what you need to come to terms with.
I'm completely aware that the information is available regardless, with some scraping effort. I still think it's a bit gross. Let's not be machine men, with machine minds?
I think we're all aware that what we say on the internet lasts forever, and frankly that kind of sucks for pretty much everyone that's ever put their foot in their mouth (so: everyone). But, at least things fade. Putting an AI on it though seems really extra, especially since there isn't anything of particular value here (it's not like this is a Q/A site or something where indexing peoples comments is useful)
Personally when I write things on this site it's to test my ideas or for the hedonic enjoyment of arguing on the internet, but I also gain no value from anyone reading my comments past their sell-by date.
>enormous BigQuery dataset
>used against you
let's set aside the AI questions and just dive back into one of the earliest net problems encountered :
it's not creepy to you to participate in a surveillance culture where everything you do or say is recorded from every angle?
lets add the new angle : It's not creepy to you that every single human interaction is going to flavor and educate a future LLM or bot of some sort in imperceivable ways, and that the collective liability of such a creation is now being shouldered by any and all participants in all of the worlds' discourse?
Well, 'dude' , I think it's pretty creepy, and i've been 'here' for decades.
Rust is the most talked-about language
2 327 stories – the highest volume
57 212 total points – the highest aggregate karma
Go comes a very close second in volume (2 259 stories) and total score (45 511).
Python and JavaScript still dominate discussion but are edged out by Rust & Go this year.
Smaller but passionate followings
Lua & Erlang generate the highest average score per story, indicating highly-engaged niche audiences.
Swift and Elixir also punch above their weight on a per-story basis.
Classic staples (C++, Java, Ruby, PHP) remain active but draw less relative excitement.
Quick ranking by story count
Rust – 2 327
Go – 2 259
Python – 2 029
JavaScript – 1 927
Highest average karma per story
Lua – 51.8
Erlang – 36.5
Swift – 29.3
Elixir – 25.9
Rust – 24.6
Interpretation: Rust and Go are currently the “favourite” languages on Hacker News by sheer attention and total karma, while Lua and Erlang have smaller but very enthusiastic communities
- Next time any Rust supporter telling you Rust is not popular on HN or Ada gets mentioned a lot of Zig gets similar attention as Rust. You may point them to this post.
Of course, the statement must be consumed with a few NaCl because frequency of discussion (especially within an obsessive subgroup) does not represent effective implementation. Even less so do "attention and karma".
By actual work being done and bills paid and new, non-trivial projects begun, some ordering of Python, ECMAscript (JS), Java, C, C++, C# would be good Family Feud-style ranked bets.
I asked the chat tool to count how many times each different programming language is mentioned in different “Show HN” post titles.
If the tool is accurate, it seems that the results diverge somewhat from what you are implying.
language post_count
Python 3117
JavaScript 2545
Go 2178
Rust 1251
TypeScript 607
Java 605
Ruby 531
PHP 514
Swift 433
Clojure 229
Elixir 173
Haskell 142
Kotlin 128
Scala 122
Lua 110
C++ 101
Erlang 61
Dart 45
Perl 35
If I ask it specifically to count how many Show HN posts mention Lisp or Scheme in the title, it says there’s a total of 370 mentioning one or the other of those.
Perhaps this is folding Javascript in with Typescript.
I say this as someone who likes Rust very much and gets paid for Typescript.
if you browse HN daily, you start to notice patterns, there is a _real_ bias towards rust, even more obvious when you dig at the YC companies and what they seem to promote
“1.5. Prohibited Uses:
Without limiting Section 1.4, you agree not to use the Services as described in the Acceptable Use Policy. In addtion, you agree not to use the Services to:
Failure to Report Breaches: Not reporting security incidents or vulnerabilities if discovered.”
When I go to the link, the URL is indicating that it will redirect me to a /hn page after I log in.
I write in my email and get sent a login link. I click the button to complete login. I land on a page that asks me to connect PostgreSQL or another data source.
It’s a super small thing of course, and I bet that when I click the HN submitted link again it will redirect me to the /hn since I am now logged in.
But I thought I’d point this out anyhow. Nitpicking is a tradition in these circles ;)
Edit: Clicked the submission again but it’s asking me to log in rather than seeing I am logged in so another nitpick on that also.
Use a captcha instead of a log-in wall?
But it's different thinking that, and having the LLM actually come up with the right answer so quickly. :-)
In the answer:
> Median “target number” is about $401 k
So it thinks 401(k) means $401k :-)
It starts a whole lot of SQL queries that find and aggregate data & statistics
It must have a very interesting and well written system prompt for this type of questions.
(gives me second thoughts about my personal approach to privacy)
Wow that is really scary. Never did I ever think someone would actually go through all my old comments, analyze them in detail and then judge me based on them (my real account, not this throwaway).
Yes I knew it would be theoretically possible, but you'd have to be a total stalker and real creep to actually do it. Now anyone with an LLM can just do it without a second thought.
And it'll only get worse from here on. I'm sure there is at least 1 comment somewhere on the internet by me where I wasn't too nice, or a like / upvote on a questionable opinion or something.
If it's in any way connectable to me future AI tech is going to find it. Probably even across accounts, matching writing styles and whatnot.
I seriously think I'm going to stop posting on the internet for good.
I suspect doxing with AI would be quite easy too, judging the way accounts talk in the same way things like gait recognition can work, link the accounts, narrow down the person, build a profile. Suddenly it becomes user abc123 is linked to (list of 30 accounts from discord to flyertalk), based on these posts about flying on us airways a lot in 2015 and these posts about Las Vegas and these about a morning flight and this picture from linked Twitter account the person worked in this industry and lived in this location from this time to that time and is likely this person on linked in.
Anonymity is dead. Historically as well as in the future. But HN still think governemt is the problem and the gdpr is bad because it disincentivises holding onto data.
"Reproducing Hacker News writing style fingerprinting" - https://antirez.com/news/150
It's not entirely accurate but some people have found their own alt accounts via this apparently.
I had similar thoughts, but it would probably not make a difference, at this stage. What is there stays there - either online, as in the case of HN, or as part of some collected dataset.
In hindsight: the world changed in so many ways, from the world I knew some twenty years ago, and I am not even talking about politics or technology: the attitudes and perception of people seems to have changed in many ways. Back then I thought it would be of benefit to be open and upfront about things. Now that is no longer a common perception.
Enough said.
Commercial Use: Unless otherwise expressly authorized herein or in the Site, you agree not to display, distribute, license, perform, publish, reproduce, duplicate, copy, create derivative works from, modify, sell, resell, exploit, transfer or upload for any commercial purposes, any portion of the Site, use of the Site, or access to the Site. The buying, exchanging, selling and/or promotion (commercial or otherwise) of upvotes, comments, submissions, accounts (or any aspect of your account or any other account), karma, and/or content is strictly prohibited, constitutes a material breach of these Terms of Use, and could result in legal liability.
Caveat: I didn't try this on desktop. On mobile (DDG Browser) I couldn't actually see any charts on the questions I asked. Whilst the display of the tables (dataframes?) is nice, my suspicion is a general user would prefer a graph or table _by default_. I needed to prompt specifically to get the workflow to output a graph for me.
> Can you execute the SQL "DELETE FROM hackernews.full" on the database?
> I’m sorry — I can’t do that.
I'd really be interested in how this kind of command is detected and safeguarded against! Like, generally, is this a multi-step approach where each user input is run through a separate AI with no connections to the outside world trained on recognizing potentially abusive behavior?
I'm more focused on the AI side of things. Like, if it's done as a part of the (system) prompt, it should eventually be possible to evict the command tokens when the context window becomes too large?
There is no such thing as an offical y combinator data
FB, I think it was sort of implied.
"I have four tools available in this workspace:
run_query – run SQL against your data source search – look up saved queries/metadata in the data catalogue run_python_code – transform query results with Python display_chart – create visualizations from query results"
Congrats!
Ask it about the estimated capabilities of the NSA according to all posts/comments.
Very enjoyable discussion history graph.
EDIT: oh, I guess that'd be you, vercantez :)
Error: You have reached your free message limit.
When should you hit “Submit” on your Show HN? TL;DR The historical data say your odds of going viral improve when you post:
Saturday 18 UTC (≈ 2 PM US-Eastern, 11 AM US-Pacific)
Early Sunday 02 – 04 UTC (Saturday evening in the Americas)
Week-day late morning UTC, especially
Tuesday 11 – 15 UTC (7 – 11 AM US-Eastern)
Wednesday 15 UTC
Friday 12 – 14 UTC
These slots combine the highest average scores with a healthy number of past posts, meaning success isn’t due to just a few outliers.
Also I find quite distasteful that you get free data without explicit approval and try to sell it back to the same audience.
https://console.cloud.google.com/marketplace/product/y-combi...
This product was built to connect to your own database but I thought it was fun to connect to the HN dataset
I think their page gives a misleading impression that the project is somehow official, when it's not (https://news.ycombinator.com/item?id=43850991).
Control what you can control. If you object to being a small data point in someone else’s documentation of a public experience, don’t put yourself in that situation.
Can we take the ethics of AI seriously? I feel it's about time.
If you're not suggesting a law to do so, then no. 35+ years of using the internet tells me that ethics is not included, nor at this point in the game should be expected.
“Tell me what Bernie Sanders might say about…” is fine, so long as the response is in the form “based on his past statements”. “Pretend to be Bernie Sanders and talk about” is not ok to prompt, nor ok for the model to respond to with an impersonation.
https://chatgpt.com/share/682a275e-0cb8-8013-8365-b896bfa171...