What are use-cases for this? I mean, obviously detecting the filetype is useful,...

nindalf · on Feb 16, 2024

> no way an "AI powered" tool can be more reliable

The article provides accuracy benchmarks.

> you would be better off just rejecting it completely

They mention using it in gmail and Drive, neither of which have the luxury of rejecting files willy-nilly.

fuzztester · on Feb 16, 2024

I have not tried it recently, but IIRC, Gmail does reject attachments which are zip files, for security reasons.

wildrhythms · on Feb 16, 2024

Gmail nukes zips if they contain an executable or some other 'prohibited' file type. Most email providers block executable attachments.

n2d4 · on Feb 16, 2024

Virus detection is mentioned in the article. Code editors need to find the programming language for syntax highlighting of code before you give it a name. Your desktop OS needs to know which program to open files with. Or, recovering files from a corrupted drive. Etc

It's easy to distinguish, say, a PNG from a JPG file (or anything else that has well-defined magic bytes). But some files look virtually identical (eg. .jar files are really just .zip files). Also see polyglot files [1].

If you allow an `unknown` label or human intervention, then yes, magic bytes might be enough, but sometimes you'd rather have a 99% chance to be right about 95% of files vs. a 100% chance to be right about 50% of files.

[1] https://en.wikipedia.org/wiki/Polyglot_(computing)