davidm1729's comments

davidm1729 · on Aug 3, 2023

Hi, I'm David and I've worked along other engineers at Brave on this! Thanks for your feedback, it would certainly be a nice addition, although we may want to focus a bit more on quality first. Thanks!

qingcharles · on Aug 3, 2023

Thank you for you and your team's hard work on this. To break the small monopoly on image search is no easy thing and I appreciate it.

It also explains why Brave Search is slow today I suspect... :)

teruakohatu · on Aug 3, 2023

Hi David, reverse image search is an easier problem than good search quality. I am happy to chat with you about it, email in HN profile.

davidm1729 · on Dec 16, 2021

Hi, I'm one of the authors of the post

Thanks for pointing us to CLMUL, I'm not familiar with these kind of multiplications, but, converting the quote bitmask to a quoted bitmask would certainly make it faster. With this new bitmask, we could negate it and AND it with the newline mask, generating a mask of newlines that are not inside quotes. Getting the last newline then would be a simple CLZ of that mask. And there wouldn't be a need to resort to byte to byte processing.

In our tests, going byte to byte for more iterations to keep the alignment when hitting the "else case" performed worse than making the unaligned loads, but as you say "just use CLMUL" (as all loads will be aligned) :D

jart · on Dec 16, 2021

PMOVMSKB/BSF/POPCNT takes serious wizardry, but instructions like PCLMULLQLQDQ make you feel like Gandalf. It's defined:

    pair clmul(uint64_t a, uint64_t b) {
      uint64_t t, x = 0, y = 0;
      if (a && b) {
        if (bsr(a) < bsr(b)) t = a, a = b, b = t; /* optional */
        for (t = 0; b; a <<= 1, b >>= 1) {
          if (b & 1) x ^= a, y ^= t;
          t = t << 1 | a >> 63;
        }
      }
      return (pair){x, y};
    }

There's a famous paper on how it can perform polynomial division at 40gbps. It's really cool that it has practical applications in things like CSV too. https://www.intel.com/content/dam/www/public/us/en/documents...

zwegner · on Dec 16, 2021

CLMUL in general is a bit weird to wrap your head around, but a CLMUL with -1 isn't too tricky: it's like a running 1-bit sum, or in other words, each bit in the result is the parity of all the bits up to that point in the multiplier.

> In our tests, going byte to byte for more iterations to keep the alignment when hitting the "else case" performed worse than making the unaligned loads, but as you say "just use CLMUL" (as all loads will be aligned) :D

I was talking about using bitwise operations with the quote/escape/newline masks already computed (like in the blog post I linked), rather than a byte-by-byte loop. But yeah, CLMUL is better anyways :)

gpderetta · on Dec 16, 2021

CLMUL is quite interesting. I learned about it when going in depth on how multiplications help with hashing.

A multiplication is in practice: - a sum over - a series (i.e. one for each bit set in the multiplier) - of shifts (where the shift amount is the index of that bit in the multiplier)

The shifting and the combining are great for hashing as they "distribute" each bit around.

CLMUL simply replaces the addition in step one with xor (which can also be thought as the single bit carryless addition).

davidm1729 · on Jan 24, 2021

Well, that's the secret sauce ;)

davidm1729 · on Jan 19, 2021

Thank you so much!

What do you mean about having a key?

> a way to jump to a genre or theme

Yes! That's a good idea!

I'm also thinking about personalized visualizations based on your favorite books, maybe this could make it more shareable on social networks in order to get traction.

M1010101 · on Jan 19, 2021

> What do you mean about having a key?

When I first looked at the page I wondered if the colors represented anything specific (fiction, drama, money, etc)

> I'm also thinking about personalized visualizations based on your favorite books, maybe this could make it more shareable on social networks in order to get traction.

Yeah it seems like you could generate quite a nice reading list. The first thing I did was search a few of the books on my shelf and see what was similar. Its a really nice way to explore books.

Apologies as I don't understand the technicalities behind this but I feel this system would apply well to a Spotify / YouTube browser. It would help you explore without their algorithms putting you in a filter bubble.

Best of luck!

jaxomago · on Jan 23, 2021

What were the size of the circles relating to? Popularity? Or sales?

davidm1729 · on Jan 24, 2021

You could say it is "popularity", yes

davidm1729 · on Jan 18, 2021

I discovered a few books that I loved on Twitter a few months ago by pure luck, and I find it weird that in 2020 we didn't have more than mouth to mouth, bestseller lists, and "you may also like" to come upon some of these gems.

The site is quite self-explanatory, a large interactive map with more than 100,000 books placed and colored by their similarity. Right now, it's completely based on an algorithm, and things like being written by the same author do not directly affect position or color.

Probably the most interesting feature, apart from the map, is that when clicking on a book similar books will be highlighted based on that. However, there are other features like the ability to create your own lists.

Any feedback will be appreciated. Thank you!

lmarcos · on Jan 18, 2021

Nice project.

What do you take into account when computing the similarly of two books? Do you use metadata (genre, author, etc.) or something else? Note that I'm more interested in the data that is used rather than in the methodology.

davidm1729 · on Jan 19, 2021

Thank you!

Some sites will show you "other users also saw..." information. This is a very raw form of similarity, but a good similarity score can be computed from that data. There is no metadata involved at all right now.