Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
I work on code search at GitHub – what needs to improve? (twitter.com/jnbrymn)
62 points by softwaredoug on Jan 31, 2021 | hide | past | favorite | 33 comments


1. Exact match. Let me search for "foo.bar" and only find that exact sequence rather than tokenising it and finding all files with "foo" and "bar". This issue alone has led me to never use it in favour of clone + ripgrep

2. Inline expand in search results. Let me click a button to see all matches in a file when there's more than fits into your result preview, rather than having to click into the search result then repeat the same search with my browser search to find where in the file it is


Definitely 'exact match search' (or 'verbatim search' in evil-ese)

Also bunching dupes and near-dupes. It stinks to wade through 100s of duplicate files.

For C and C++ files, it would be nice if you prioritized implementation files (and only header files with inline code etc) above boring header files in the search results.


I am pretty sure they fixed point 1 at some point, as I remember being frustrated by this, but no longer.


No they didn't, it frustrated me just a few days ago because I kept getting useless results and couldn't find what I was looking for.


Over a year ago http://grep.app showed up which does basically all I want from GitHub search and Nat (GitHub CEO) said lets talk, but nothing public happend. Biggest thing I use it for is looking for examples of how to use some API, so exact search of the function name and filter by language. The regex is also super nice to have when I need it.


Came to say the same thing. grep.app is nearly perfect for this use. If we had the speed and exact search across all github, that would be fantastic.


Exclude forks. It kills me to see pages and pages of the same code and not being able to find other repositories with code matching my query.


I just tested this with a private repo and confirmed: currently if you search for text that appears only in old commits and was removed, I see nothing show up in the search. I would like to search not only for text in the latest state of the repo, but the entire history. This would be a KILLER feature.


Ability to group or remove duplicate files from search.

E.g. this appears in X more times (in Y other repositories). Particularly useful when the code is part of sdk (like aws sdk) which almost all repos have a copy off, and so the result appears 5k times with the same file showing over and over.

I use GitHub search a lot to find how people are using an API or some function but the main results get obscured because of the problem above.


Unless things have recently changed, I have never been able to search on a specific branch of a given repo. I want to just search in a feature branch, but the default search doesn't do that and I have not seen in the UI on how to change to a different branch. Additionally, if there are many pages returned for a given repo, it is very hard / slow to navigate through the results if you don't get a hit on the first page.


I'd like to be able to search for active and most active forks. I'd also like to code search fork commit additions.

If someone fixed my issue in a new commit in fork somewhere, or already added the feature I need, I'd really like to know, so I don't reinvent the wheel!


It would be nice to click on a class name in Java and navigate to where it is defined in the project.


This is actually already a thing, but doesn't seem to work consistently. I've seen it happen on a few popular projects and its magical; but I can't figure out why it doesn't work on some of my repos.

Edit: checked a little project I committed last weekend, and it works! I think the search indexing is done every once in a while, not incrementally.


Exactly, more language specific search functionality.

I want to be able to hover over a variable, function or class and see where it is defined and some places where it is used elsewhere.

Essentially the features of an advanced IDE, without the ability to edit and a simple user interface. I don't want the website to break if JS isn't enabled or use a 1000% CPU.


This may be old, but I remember often looking for a function and having to paginate through the results to get to the definition.

So having the function/method/var/etc definition(s) at the top of the results.


Please index source code files that are "too large to render". It seems like the indexing misses these files, which are sometimes very long automatically generated stubs, but I still need to figure out where they are defined. For example, the python kubernetes modules have data models that are large, and sometimes tricky to find (at least until I learned the code, but the lack of search results was confusing).


Use a proper indexer and search engine. Like xapian.

git grep works only for primitive results, but for all the metadata you need proper indices and an ui. Eg. exclude forks, rank headers higher than source. Rank titles or headers higher than bodies. Verbatim vs regex vs parsed search queries.

Xapian can all do that. And is very efficient on that scale. Unlike elasticsearch or Google code search.


Advanced search options, more the better. Search by user, project, date, file type, language, etc. and allow multiple selections and negation. Search results ordering (by date, relevance, other facets) Deduplication in search —- often times you just get pages of results that are all forks


They do have some here: https://github.com/search/advanced

I think they made it harder to find - I remember being able to get to it in just the search bar. Having the syntax autocompleted for you would be nice.


Quite often I use the search to find a string across projects (we have quite a lot of microservices).

It would be great if you could exact match and also if the results would be from master by default. Quite often it puts me in a commit and I have no idea why or if it's still current.


Remove specific folders from the results. I stopped using Github Code Search because when i do it get dozens of pages for a match in unit tests and i don‘t find the single results in the real codebase.


Most of the functionality requested in comments here is duplicative with what git offers on a local command line. One strategy:

$ git clone —depth 1 —bare <URL>

Followed by such as:

$ git-grep <regex>

$ git-log -G<regex> —stat -55


True, that works if you already know which project you want to search in, but doesn't help for searching across multiple projects / all of GitHub.


The ability to search a specific contributor's commits.


When I search for an identifier it comes up with usages, tests, docs, etc. I have to dig several pages to find the declaration/definition.


Search in repos that have X dependencies. For instance, I want to find a file named `_app.ts` in repos where next and typescript is installed.


Buy SourceGraph?


Please don't, it's way to nice to be bound to a single hosting provider.


Yeah, I don't see any single code host achieving near-100% market share (or even near-50%, for that matter). That means that multi-code-host ("universal") code search is going to address a much bigger market than code search that's tied to a single code host. Also, big companies (which is where most of the money comes from) all have their code scattered among a ton of different systems, and code search across all their code (not a semi-arbitrary subset) is way more valuable.

So, Sourcegraph will remain independent. (Sourcegraph CEO here.)


It’s not trivial to use CLI tools to search in a clone only in comments. I could use that feature.


sorry hijacking oot, do you still remember the course name from tu delft about business process management ? is it this https://www.edx.org/professional-certificate/delftx-business...? thank you


I need an easy way to see how a single file has changed over time


Need to search across all private repos at once.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: