sumo43's comments

sumo43 · 2025-11-02T18:24:56 1762107896

Try running this using their harness https://huggingface.co/flashresearch/FlashResearch-4B-Thinki...

sumo43 · 2025-11-02T18:21:36 1762107696

I made a 4B Qwen3 distill of this model (and a synthetic dataset created with it) a while back. Both can be found here: https://huggingface.co/flashresearch

Nymbo · 2025-11-02T22:08:35 1762121315

Just tried this out with my web search mcp, extremely impressed with it. Never seen deep research this good from a model so small.

Imustaskforhelp · 2025-11-03T03:22:20 1762140140

Can you please create a huggingface space or something similar, I am not sure about the state of huggingface but I would love to be able to try it out in a browser or something similar if possible as I am really curious and I just love qwen3 4b as they were one of the models which work even on my intel integrated gpu graphics card at a really impressive rate and they were really nice the last time I tried but this looks even more cooler/practical.

I had once an idea of using something like qwen4 or some pre-trained AI model just to do a (to censor or not to) idea after the incidents of mecha-hitler. I thought if there was some extremely cheap model which could detect that it is harmful that the AI models of Grok itself couldn't recognize, it would've been able to prevent the absolute advertising/ complete disaster that happened.

What are your thoughts on it? I would love to see an Qwen 4B of something similar if possible if you or anyone is up to the challenge or any small LLM's in generals. I just want to know if this idea fundamentally made sense or not.

Another idea was to use it for routing purposes similar to what chatgpt does but I am not sure about that now really but I still think that it maybe worth it but this routing idea I had was before chatgpt had implemented it, so now after it implemented, we are gonna find some more data/insights about if its good or not/ worth it, so that's nice.

greggh · 2025-11-03T14:38:53 1762180733

I use emotions-analyzer-bert for this classifying content in a similar way. It's very small and very fast, under a gig of vram in use.

bigyabai · 2025-11-03T05:59:15 1762149555

> What are your thoughts on it?

You don't really need an entire LLM to do this - lightweight encoder models like BERT are great at sentiment analysis. You feed it an arbitrary string of text, and it just returns a confidence value from 0.0 to 1.0 that it matches the characteristics you're looking for.

sumo43 · on Oct 31, 2024

I think the fine tuned policies are still very brittle, but I agree that this is super promising. It's also one of the most open (the model is still closed) research blogposts we've seen from any private embodied AI lab

sumo43 · on May 24, 2024

seems like an improvement on the aloha approach? You still need to finetune it on roughly the same amount of OOD examples. Contrast this with google's approach over 2023, which was training large vision-language models with the goal of generalizing on OOD.

sumo43 · on May 1, 2024

Location: US

Remote: Yes

Willing to relocate: Yes (US)

Technologies: Python, PyTorch, HuggingFace, C++

Résumé/CV: https://drive.google.com/file/d/1qY-m1tKz4_QpHgxaGryC2vk-DGs...

Email: sumo43@proton.me

ML Engineer & Research Scientist. Ex AI grant startup, hedge fund. I've previously worked on inference for LLMs and vision models & have experience with data curation and multinode training. Looking for summer internships or part time positions

sumo43 · on March 27, 2024

Maybe true for instruct, but pretraining datasets do not usually contain GPT-4 outputs. So the base model does not rely on GPT-4 in any way.

sumo43 · on Feb 1, 2024

SEEKING VOLUNTEERS: open source self-play training for language models

we are a small team associated with EleutherAI. looking to push the frontier of open source language models through self-play. so far we have implemented SPIN. compute included.

email tyoma9k@gmail.com

edit: formatting

sumo43 · on Sept 18, 2023

Cool service. It's worth noting that, with quantization/QLORA, models as big as llama2-70b can be run on consumer hardware (2xRTX 3090) at acceptable speeds (~20t/s) using frameworks like llama.cpp. Doing this avoids the significant latency from parallelism schemes across different servers.

p.s. from experience instruct-finetuning falcon180b, it's not worth using over llama2-70b as it's significantly undertrained.

borzunov · on Sept 18, 2023

Hi, a Petals dev here. You're right, there's no point in using Petals if your machine has enough GPU memory to fit the model and you're okay with the quantization quality.

We developed Petals for people who have less GPU memory than needed. Also, there's still a chance of larger open models being released in the future.

brucethemoose2 · on Sept 18, 2023

AFAIK you cannot train 70B on 2x 3090, even with GPTQ/qlora.

And the inference is pretty inefficient. Pooling the hardware would achieve much better GPU utilization and (theoretically) faster responses for the host's requests

sumo43 · on Sept 18, 2023

For training you would need more memory. As for the pooling, Theoretically yes but wouldn't latency play as much, if not a greater part in the response time here? Imagine a tensor-parallel gather where the other nodes are in different parts of the country.

Here I'm assuming that Petal uses a large number of small, heterogenous nodes like consumer gpus. It might as well be something much simpler.

brucethemoose2 · on Sept 18, 2023

> Theoretically yes but wouldn't latency play as much, if not a greater part in the response time here?

For inference? Yeah, but its still better than nothing if your hardware can't run the full model, or run it extremely slowly.

I think frameworks like MLC-LLM and llama.cpp kinda throw a wrench in this though, as you can get very acceptable throughput on an IGP or split across a CPU/dGPU, without that huge networking penalty. And pooling complete hosts (like AI Horde) is much cheaper.

I'm not sure what the training requirements are, but ultimately throughput is all that matters for training, especially if you can "buy" training time with otherwise idle GPU time.

sumo43 · on Sept 9, 2023

Hello, I'm planning to participate in this challenge. I have experience training/prompting and building products from LLMs, I've also participated in a few CTFs.

sumo43@proton.me