One of the things I miss most about the pandemic was how all of these institutions opened up for the world. Lately they have been closing down not only newer course offerings but also putting old videos private. Even MIT OCW falls apart once you get into some advanced graduate courses.
I understand that universities should prioritize their alumni, but there’s literally no cost in making the underlying material (especially lectures!) available on the internet. It delivers immense value to the world.
Those moments are the best part of classroom education. When a super knowledgeable person spends a few weeks helping you get to the point where you can finally understand something cool. And you can sense their excitement to tell you about it. I still remember learning Gauss-Bonnet, Stokes Theorem, and the Central Limit Theorem. I think optimism under uncertainty falls in that group.
I personally don't like this, because it makes a place more exclusive with legal moats, not genuine prestige. If you're a professor, this also makes your work less known, not more. IMO the only beneficiaries are either those who paid a lot to be there, lecturers who don't want to adapt, and university admins.
No, it's because they don't want people to find out they've been reusing the same slide deck since 2004
(I mean, I have no idea how Coursera/edX/etc are doing behind the scenes, but it doesn't seem like people talk about them the way they used to ~10 years ago.)
I agree it's hard, but I think it's because initially the lecturers were involved in the online community, which can be tiring and unrewarding even if you don't have other obligations.
I think the courses should have purely standalone material that lecturers can publish, earn extra money, and refresh the content when it makes sense. Maybe platform moderators could help with some questions or grading, but it's even easier to have chatbot support for that nowadays. Also, platforms really need to improve.
So, I think the problem with MOOCs has been the execution, not the concept itself.
On the flip side, that'd require many professors and other participants in universities to rethink the role of a university degree, which proves to be much more difficult.
If that seems unlikely, remember that image generation didn’t take off till diffusion models, and GPTs didn’t take off till RLHF. If you’ve been around long enough it’ll seem obvious that this isn’t the final step. The challenge for you is, find the one that’s better.
RL excels at learning control problems. It is mathematically guaranteed to provide an optimal solution for the state and controls you provide it, given enough runtime. For some problems (playing computer games), that runtime is surprisingly short.
There is a reason self-driving cars use RL, and don't use GPTs.
Some part of it, but I would argue with a lot of guardrail in place and not as common as you think. I don't think the majority of the planner/control stack out there in SDC is based. I also don't think any production SDCs are RL-based.
Apparently AI sets the best time even better than the pros It is really useful when it comes to controlled environment optimizations
Control theory and reinforcement learning are different ways of looking at the same problem. They traditionally and culturally focussed on different aspects.
They also force exploration as a part of the algorithm.
They can be used for synthetic data generation once the reward model is good enough.
SSL creates all the connections and RL learns to walk the paths
spring course is on YouTube https://m.youtube.com/playlist?list=PLoROMvodv4rN4wG6Nk6sNpT...
Take, for example, a typical binary classifier with a BCE loss. Suppose I wanted to shoehorn RL onto this: how would I do that?
Or, for example, the House Value problem (given a set of features about a house for sale, predict its expected sale value). How would I slap RL onto that?
I guess my confusion comes from how the losses are hooked up. Traditional losses (BCE, RMSE, etc.) I know about; but how do you bring RL loss into problems?
For the house value problem, you can quantify how far the prediction is from the true value, there are lots of regression models with proven methods of adjusting the model parameters (e.g. gradient descent), and the feature space comprises mostly monotone, weakly interacting features like quality of neighborhood schools and square footage. It's a "traditional" problem and can be solved as well as possible by the traditional methods we know and love. RL is unnecessary, might require more data than you have, and might produce an inferior result.
In contrast, for a sequential decision problem like playing go, the binary won-lost signal doesn't tell us much about how well or poorly the game was played, it's not clear how to improve the strategy, and there are a large number of moves at each turn with no evident ranking. In this setting RL is a difficult but possible approach.
RL is nice in that it is handles messy cases where you don't have per example labels.
How do you build a learned chess playing bot? Essentially the state of the art is to find a clever way of turning the problem of playing chess into a sequence of supervised learning problems.
Let's say I do have a problem in that setting; say the chess problem, where I have a chess board with the positions of chess pieces and some features like turn number, my color, time left on the clock, etc. are available.
Would I train a DNN with these features? Are there some libraries where I can try out some toy problems?
I guess coming from a classical ML background I am quite clueless about RL but want to learn more. I tried reading the Sutton and Barto book, but got lost in the terminology. I'm a more hands-on person.
I don't really see why you would want to use it for binary classification or continuous predictive modeling. It's why it excels in game play and operational control. You need to make decisions now that constrain possible decision in the future, but you cannot know the outcome until that future comes and you cannot attribute causality to the outcome even when you learn what it is. This isn't "hot dog/not a hot dog" that generally has an unambiguously correct answer and the classification itself is directly either correct or incorrect. In RL, a decision made early in a game probably leads causally to a particular outcome somewhere down the line, but the exact extent to which any single action contributes is unknown and probably unknowable in many cases.
I've already studied a lot of deep learning.
Please confirm if these resoruces are good, or suggest yours:
Sutton et al. - Reinforcement Learning
Kevin Patrick Murphy - Reinforcement Learning, an overview https://arxiv.org/abs/2412.05265
Sebastian Raschka (upcoming book)
...