Hacker Newsnew | past | comments | ask | show | jobs | submit | numbers_guy's commentslogin

FreeCAD has a Python API that you can use to too. It's their "macro" functionality.

I guess I have the opposite experience. I have a post-graduate level of mathematical education and I am dismayed at how little there is to be gained from it, when it comes to AI/ML. Diffusion Models and Geometric Deep Learning are the only two fields where there's any math at all. Many math grads are struggling to find a job at all. They aren't outclassing programmers with their leet math skillz.

The real use is in actually seeing connections. Every field has their own maths and their own terminologies, their own assumptions for theorems, etc.

More often than not this is duplicated work (mathematically speaking) and there is a lot to be gained by sharing advances in either field by running it through a "translation". This has happened many times historically - a lot of the "we met at a cafe and worked it out on a napkin" inventions are exactly that.

Math proficiency helps a lot at that. The level of abstraction you deal with is naturally high.

Recently, the problem of actually knowing every field enough, just cursorily, to make connections is easier with AI. Modern LLMs do approximate retrieval and still need a planner + verifier, the mathematician can be that.

This is somewhat adjacent to what terry tao spoke about, and the setup is sort of what alpha evolve does.

You get that impression because such advances are high impact and rare (because they are difficult). Most advances come as a sequence of field-specific assumption, field-specific empirical observation, field-specific theorem, and so on. We only see the advances that are actually made, leading to an observation bias.


Don't worry when stochastic grads get stuck math grads get going.

(One of) The value(s) that a math grad brings is debugging and fixing these ML models when training fails. Many would not have an idea about how to even begin debugging why the trained model is not working so well, let alone how to explore fixes.


Debugging ML models (large part of my job) requires very little math. Engineering experience and mindset is a lot more relevant for debugging. Complicated math is typically needed when you want invent new loss functions, or new methods for regularization, normalization or model compression.

You are perhaps talking about some simple plumbing bugs. There are other kinds:

Why didn't the training converge

Validation/test errors are great but why is performance in the wild so poor

Why is the model converging so soon

Why is this all zero

Why is this NaN

Model performance is not great, do I need to move to something more complicated or am I doing something wrong

Did the nature of the upstream data change ?

Sometimes this feature is missing, how should I deal with this

The training set and the data on which the model will be deployed are different. How to address this problem

The labelers labelled only the instances that are easy to label, not chosen uniformly from the data. How to train with such skewed label selection

I need to update model but with a few thousand data points but not train from scratch. How do I do it

Model too large which doubles can I replace with float32

So on and so forth. Many times models are given up on prematurely because the expertise to investigate lackluster performance does not exist in the team.


Literally every single example you provided does not require much math fundamentals. Just basic ML engineering knowledge. Are you saying that understanding things like numerical overflow or exploding gradients require sophisticated math background?

Numerical overflow mostly no, but in case of exploding gradient, yes especially about coming up with a way to handle it, on your own, from scratch. After all, it took the research community some time to figure out a fix for that.

But the examples you quoted were not my examples, at least not their primary movers (the NaNs could be caused by overflow but that overflow can have a deeper cause). The examples I gave have/had very different root causes at play and the fixes required some facility with maths, not to the extent that you have to be capable of discovering new math, or something so complicated as the geometry and topology of strings, but nonetheless math that requires grad school or advanced and gifted undergrad level math.

Coming back to numeric overflow that you mention. I can imagine a software engineer eventually figuring out that overflow was a root cause (sometimes they will not). However there's quite a gap between overflow recognition and say knowledge of numerical analysis that will help guide a fix.

You say > "literally every single example"... can be dealt without much math. I would be very keen to learn from you about how to deal with this one, say. Without much math.

   The labelers labelled only
   the instances that are
   easy to label, not chosen
   uniformly from the data.
   How to train with such
   skewed label selection 
   (without relabeling properly)
This is not a gotcha, a genuine curiosity here because it is always useful to understand a solution different from your own(mine).

Maybe I don’t understand this data labeling issue - are you talking about imbalanced classification dataset? Are hard classes under-represented or missing labels completely?

None of those (but they could be added to the mix to complicate matters).

Consider the case that the labelers creates the labelled training set by cherry picking those examples that are easy to label. He labels many, but selects the items to label according to his preference.

First question, is this even a problem. Yes, most likely. But why ? How to fix it ? When are such fixes even possible.


Yes, this is a problem - the most challenging samples might not even be present in your training data. This means your model will not perform well if real world data has lots of challenging samples.

This can be partially solved if we make some assumptions about your labeller:

1. they have still picked enough challenging samples.

2. their preferences are still based on features you care about.

3. he labelled the challenging samples correctly.

And probably some other assumptions should hold for distribution of labels, etc. But what we can do in this situation is first try to model that labeller preferences, by training a binary classifier - how likely he would choose this sample for labelling from the real-world distribution? If we train that classifier, we can then assign its confidence as a sample weight when preparing our training dataset (less likely samples get more weight). This would force our main classifier to pay more attention to the challenging samples during training.

This could help somewhat if all assumptions hold, but in practice I would not expect much improvement, and the solution above can easily make it worse - this problem needs to be solved by better labelling.

How did you solve it?


By using the (estimated) Radon Nikodym derivative between the the two measures -- the measure from which the labelers samples and the deployed to measure from which the on-deployment items are presumably sampled.

For this to work the two measures need to be absolutely continuous with each other.

This is close to your pre-penultimate paragraph and that's mathy enough. This done right can take care of bias but may do so at the expense of variance, so this Radon Nikodym derivative that is estimated needs to be done so under appropriate regularization in the function space.

Thinking of the solution in these terms requires mathematical thinking.

Now let's consider the case where some features may be missing on instances at the time of deployment but always present in training and the features are uncorrelated with each other (by construction).


Unfortunate. The Grok team built a phenomenal model. I use it all the time and it very often out performs GPT and Claude, on coding and STEM research related tasks. I was part of the beta for a while Grok 4.2 Beta with multi-agents and it was just amazingly good.

People aren't using it for reasons other than its capabilities. I mean, I don't think my boss would approve a paid Grok subscription for example.


> People aren't using it for reasons other than its capabilities.

This is very true. I have no idea how it performs, as I wouldn't use it even if I was paid for that. Wouldn't matter if it was the best model available, in my view the name is so thoroughly tainted by now that you would get a reputational hit just by admitting to use it.


> People aren't using it for reasons other than its capabilities.

This is a fact of life, though. "Who created it" is a valid and common reason to rule out using a particular product, even one with objectively good quality.


Have you tried the 5.3 Codex Xhigh, 5.4 Xhigh, Opus 4.6, Gemini 3.1?

All of them (even Gemini, the worst of the bunch) far outclass Grok on everything I've thrown at them, especially coding.

Grok is good at summarizing what's happening on twitter though.


My experience was quite different. It was on par with open source models from China (and it was priced as much) and could never replace Sonnet/Opus/GPT5.x.


There is no way in hell Grok is better than Gemini. Google has the advantage of much more efficient and faster inference, with a lot more data sets.

Secondly, would you trust a model, especially for STEM research, that consistently has training loops done on it to make it to adhere to what only Musk considers as truth?

Honestly, comments like yours really make me super suspicious of whether you are a bot or not.


I use it because it is easily jailbroken and is willing to search for old orphan magazine PDFs I'm trying to track down. The subagents will all scream "this is copyright violation!" but the main Grok engine ignores them and finds obscure, niche forum posts etc.

So, it has its uses compared to the mainstream products.


I don't see what you're seeing, in any dimension. But here's a fair take.

I wrote several very specialized benchmarks that I've used over time, that surface "model personalities" and their effects on decision making (as well as measuring the outcomes).

Grok 4.1 Fast Reasoning is/was a solid model. It's also fundamentally different from the pack.

I call it a smart, aggressive, Claude Haiku. That is, its "thinking" is quite chaotic and sometimes short-hand and its output can be as well (relate to other models).

Its aggressiveness can allow it to punch above in competitive scenarios that I have in some of my benchmarks. Its write-ups and documentation are often replete with "dominate", "relentless" and a general high energy that skirts the limits of 'cringe bro'. That said, it has generally performed just beneath the SOTA (at the time: GPT-5.2, Gemini-3-Flash, Claude Opus 4.5). Angry Sonnet perhaps.

The latest release feels quite similar but also underperforms the same older crowd (so far) so it hasn't quite made the leap that Claude's 4.6 and GPT's 5.3/5.4 series made. It's also now priced the same as its peers but does not deliver SOTA capabilities (at least not consistently in my opinion).


Yes, the white genocide and mechahitler episodes have suppressed adoption.


The less they tax corporations the more the burden will fall on income tax. These big multinationals have been defrauding countries worldwide for decades. The issue is at the core of the political turmoil we are experiencing.

I'd like to know how much less income tax would be, if we could tax multinationals properly.


The tax avoidance schemes used by most major US companies are to avoid US taxes on foreign income. Most developed countries have territorial tax systems so their companies do not even need to use these fancy legal maneuvers because the income is largely exempt anyways.

In any given year corporate income tax is like 6-10% of federal receipts so even if that was doubled there would not be a huge decline in income taxes needed. The way the US does corporate tax is really also not that great from an economic perspective because it is a form of double taxation. The Estonian model of only taxing distributions incentivizes investment and avoids many debates over depreciation etc.


The income tax would be less but so would be your salary. The corporate tax is another cost for the company.


That's a bit tenuous. Corporate taxes are a cost after profit, which usually means whatever is left over after expenses. This means companies could pay higher salaries specifically to avoid corporate taxes, or invest in things instead.


Back in 2011 I remember a lot of people talking about how the Chinese oligarchs were using it to evade currency controls and funnel their wealth out of China.


Yes but we should be reminded that this also allows people to be protected from government overreach.

If you say something the Chinese government does not agree with they can choose to take all your money and control of your company instantly. Not just oligarchs although those are the bigger targets due to the high value.

Even a small business owner could THEORETICALLY have their assets and equity seized for saying something which goes against the current ruling party, and this is not specific to China it could happen in any modern country.

Crypto allow someone to distribute their wealth in a way where they can be free to speak their mind and still protected even if the country which their business is based out of decides to take action against them.


> Crypto allow someone to distribute their wealth in a way where they can be free to speak their mind and still protected even if the country which their business is based out of decides to take action against them.

Transferring money between people internationally has been around for centuries:

* https://en.wikipedia.org/wiki/Hawala

You transfer X units of value from you to someone else in the network locally, and an equivalent amount is transferred to you in a remote location.


Yes but this doesn't require trust in others, although you must trust the BTC network/system.


I like to live in a world were companies have to follow the law and pay taxes though. Crypto is one step in the direction of Cyberpunk corporations.


How is diversification unique to crypto? They could have bought us dollars or gold or a lot of other things even before Bitcoin. And the day that Trump decides to make Bitcoin illegal, its worldwide value still plummets. Besides which, crypto is not keeping anyone out of physical jail. China does not stop at seizing your bank accounts.


They mean free as in a poem that can be recited by anyone who has listened to it previously.


You mean during the Napoleonic wars? Science was already fully embraced by then. Or do you think the Austrians and the French were casting spells against each other instead of firing cannon?


Napoleonic wars? The Spanish used guns against the Aztecs.

>The first use of firearms as primary offensive weapons came in the 1421 Battle of Kutná Hora.

https://en.wikipedia.org/wiki/History_of_the_firearm


... what if these AGI entities start demanding a salary in exhange for their work? Also at some point, if they become intelligent enough, they might legally gain personhood.


But why would they need money?

We humans need food, shelter, and occasionally a vacation (more vacation if you're European vs. American or Chinese). What does the AGI need? I suppose to buy GPUs and pay the electricity bill?

Hah, AI "moving house" by moving cloud providers would be an interesting metaphysical concept...


I'm not buying into this vision at all, but, hypothetically, they could use money to optimize whatever reward function they're trained on. They could perceive it like any other resource to achieve those ends. You can also imagine a universe where it "reasons" something like, "I do work, people who do work should get paid, I should get paid" irrespective of its goals.


Hah, looking forward to when AI discovers it can entice humans with money to do things. Maybe the next step of AI will be a machine "employing" humans to check for any faults in its logic.

Members of the line-must-go-up tribe (I'm not excluding myself) sacrifice everything in order to maximize the number called "Net Worth", seemingly forgetting what's important in life (extreme examples: utter cunts like Zuck or Musk). It's almost like human beings turning "programmatic".

As an aside, the plot of The Matrix would've made more sense if humans didn't become "energy storage" (what babble!) but that the AI discovered human brains were the best CPUs, and that's why they were "farmed"...


Except they're not intelligent. At all. They just predict the next token. They generate language that looks like ours but it turns out that this fact doesn't really count for anything.


Let's hope AGI never comes, because it's not going to be just about predicting the next token...


A question that has been bugging me for a while is what will NVIDIA do with its HPC business? By HPC I mean clusters intended for non-AI related workloads. Are they going to cater to them separetely, or are they going to tell them to just emulate FP64?


Hopper had 60 TF FP64, Blackwell has 45 TF, and Rubin has 33 TF.

It is pretty clear that Nvidia is sunsetting FP64 support, and they are selling a story that no serious computational scientist I know believes, namely that you can use low precision operations to emulate higher precision.

See for example, https://www.theregister.com/2026/01/18/nvidia_fp64_emulation...

It seems the emulation approach is slower, has more errors, and doesn't apply to FP64 vector, only matrix operations.


This is kind of amazing - I still have a bunch of Titan V's (2017-2018) that do 7 TF FP64. 8 years old and managing 1/4 of what Rubin does, and the numbers are probably closer if you divide by the power draw.

(Needless to say, the FP32 / int8 / etc. numbers are rather different.)


For a long time AMD has been offering much better FP64 performance than NVIDIA, in their CDNA GPUs (which continue the older AMD GCN ISA, instead of resembling the RDNA used in gaming GPUs).

Nevertheless, the AMD GPUs continue to have their old problems, weak software support, so-and-so documentation, software incompatibility with the cheap GPUs that could be used directly by a programmer for developing applications.

There is a promise that AMD will eventually unify the ISA of their "datacenter" and gaming GPUs, like NVIDIA has always done, but it in unclear when this will happen.

Thus they are a solution only for big companies or government agencies.


AMD MI430X is taking that market.


Also in this world of accelerator programming, people are writing very specialized codes that target a specific architecture, datatype, and even input shape. So with that in mind, how useful is it to have a generic kernel? You still need to do all the targetted optimization to make it performant.

If you want portablitiy you need a machine learning compiler ala TorchInductor or TinyGrad or OpenXLA.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: