> Spinning up a GPU seems like something to avoid.
You can do inference on GPUs as well, and for anything other than very small/lightweight models, such as noise cancellation or maybe speech recognition, it's probably worth the initial overhead.
I believe CoreML already splits workloads between CPU, NPU, and GPU as appropriate.
Yeah, this is what I was getting at. In some sense, the list of “capabilities which don’t require spinning up the GPU” is expanded. Whether something could be done by spinning up the GPU is beside the point.
You can do inference on GPUs as well, and for anything other than very small/lightweight models, such as noise cancellation or maybe speech recognition, it's probably worth the initial overhead.
I believe CoreML already splits workloads between CPU, NPU, and GPU as appropriate.