Well, if your traffic is mainly facebook domains, then you're chatting with friends. If you're visiting wikileaks and freedom.press you're an armchair freedom-fighter. And if all your traffic goes to random AWS IPs packed in encrypted VPN frames, then you're most definitely a terrorist.
Assuming I wasn't a state actor and just a lowly hacker on a wifi connection, here's some things I can tell about your VPN'd connection:
* The operating system used
* Application-specific traffic patterns
* Content-specific traffic patterns
* The VPN provider and type
First off, I know you're using a phone, because it matches mobile device tcp/ip fingerprints. Second, I can make a reasonable guess about what kind of VPN you're using, both based on the service itself and its traffic or connection pattern. Third, I can make a guess about what kinds of applications you're using, because you are using a phone and the traffic looks a certain way for certain network applications. Fourth, I can guess what kind of content you're looking at, since I have a good idea what kind of browser and application you're using. Fifth, if I can match up all those fingerprints each time, I can identify you as the sole user of that connection, meaning I can now track you whenever I see your traffic. Sixth, by manipulating your traffic in small ways I can also determine more about your host and application(s) by how they respond to network transmission problems.
Based on all that, I can send you a phished e-mail that looks to exploit any of the services or hosts or applications you're using. I don't even need to know who to e-mail; I can just spam tons of addresses and check for results that match the fingerprinted services I discovered earlier.
Another fun attack would be to actually kill every connection you tried to make over a VPN using a specific application and content provider; because it would never work over the VPN, you might eventually try it over your regular connection, giving me a new point of attack.
Digital Ocean has a great tutorial on setting up OpenVPN [1]. I've used this and gotten decent latency and good throughput over both broadband and LTE service using a small ($5/month) VPS.
I wrote a program to solve chess once. After I realized that it would take a massive amount of computing resources to finish in my lifetime, I abandoned the project.
Most interesting to me is that it really isn't that hard to create a program to solve chess (i.e. the logic behind it), it just would take too much time/money to actually do it.
It's much more difficult to create AIs and approximations like this.
Kinda weird once you realize that fact...approximating a solution to chess is much more difficult, logic wise, than actually solving chess.
Though I wouldn't be surprised if chess is solved in the next couple decades or so.
Well, theoretically a brute force universally solves any set of constraints, just takes too long. Intelligence is really only about efficiency and timescales, i.e. the dumbest algorithm would look insanely smart to us if it ran fast enough.
Actually it just might be possible to do, at least in a probabilistic sense. The author of Rybka at least managed to 'prove' (in a weak sense) that a certain opening is unplayable. Quite facinating: http://en.chessbase.com/post/rajlich-busting-the-king-s-gamb...
From a quick Google there appear to be about 10^120 possible chess games. So if you could store each game as 1 bit then:
Landauer's principle says we need at least 10^-21 Joules to change a single bit. That means we need 10^99 Joules to run through all our games. The mass-energy of the observable universe is 10^69 Joules.
Therefore, if we could convert one million trillion trillion observable universes entirely into energy, then we'd have enough to run through all our chess games on our theoretically perfect computing machine.
I'm not sure we'll get that done in the next couple of decades.
[of course, there may well be shortcuts that we can take to cut down that number a bit!]
You don't need to know the best move in every position of every possible chess game in order to play perfectly.
Eg. playing with White it is enough for you to always know what your best move is. And once you stick to it, then 99.9999... of these possible games will NEVER occur, because that would require White making at least one non-perfect move on the way.
You might be interested in learning about AIXI. It is in fact trivially easy to create a general intelligence effective in arbitrary domains ... if you assume infinite computational capability. It goes to show that the problem of artificial intelligence is really approximations and heuristics.
Actual implementations of AIXI are also quite easy. I've written one before. Not that it's very interesting to run -- it is basically indistinguishable from an infinite loop.
As for what the acronym stands for, heck if I know.
Yes, it would be interesting to see if the solution is like that of checkers where the solved variant is always a draw, or would it be a win for white. (Or much less likely from evidence so far, a win for black).
I probably spend just as much or more time sitting, thinking, and staring at partially written code, than I do actually writing the code. It's frustrating when you have superiors who don't understand that time spent thinking is just as productive as time spent typing.
That's just my personal style, although I've never worked in "large" teams on a single codebase so can't comment on what styles work best in those situations.
The best analogy I heard for communicating this to superiors is that programming is like doing a crossword puzzle. 95% of the time you're doing a crossword you're not writing but that doesn't mean you're not intensely working on solving the puzzle.
I've never been out of the country. Will I one day? Sure, but it's not like I have a burning desire to leave tomorrow and spend significant amounts of time and money at some tourist destination on the other side of the earth.
I don't really participate much in social media, I have my own interests and like to just do my own thing. I have friends who share similar interests and sometimes we'll do those things together.
I don't have to share with random facebook friends (whom are really just acquaintances -- who really has 200+ real friends?) how many countries I've been to, how many miles I ran this morning, the wonderful healthy breakfast I ate, the celebrity I met on the street, or how amazing my volunteer experience was, how wonderful my bf/gf/husband/wife is, how crazy it was at the night club, etc etc etc.
I have great memories of lots of things I've done, and I share those memories with the people I experienced them with, and sometimes I'll tell those experiences to close friends or family members, or share them at relevant moments in social situations like a related conversation taking place where I can actually add value (as opposed to just hearing myself talk and looking for more validation from the others around you).
I do what I do and I'm perfectly content. Social media as a whole is an opportunity for people to sideline brag and make them feel part of "the club." They achieve their validation by how many people "like" their photo or post.
Volunteering and traveling are two things that some (keyword some) people use as a vehicle to make themselves seem more important or esteemed than others. If you haven't traveled, surely you're a naive American who doesn't know anything about the world outside of your own bubble -- BUT on the otherhand, if you see the eiffel tower in person you somehow become wise and cultured by that experience. If you haven't volunteered recently, you certainly must be a selfish prick. And of course these people don't say these things directly to you, instead they have become an expert at implying them passively.
Meh.
There was recently a thread on front page of HN about which social networks middle school and high school kids are using. It's mainly instagram, and it's used as a vehicle to allow some kids to attain status as "popular". Everyone "follows" the popular kids, etc. I think the article of this thread highlights how that same popularity contest exists for older people too. Except it's on facebook, and you become popular by posting travel photos, volunteer photos, your color run photos, etc.
Also I never thought I'd be quoting scripture, but this is a fitting passage!
"Beware of practicing your righteousness before men to be noticed by them" ... "So when you give to the poor, do not sound a trumpet before you, as the hypocrites do in the synagogues and in the streets, so that they may be honored by men."
Another saying about players...(this one's not scripture)
"Real players don't say they're players; they just are"
Good job, you made a review site? Sorry I know I sound negative but you made a simple review site for domain registrars in an attempt to make commissions from those registrars, and then attempt to get people to refer others to those registrars through you to get commissions.
Unique, new, or innovative? No
Better than current solutions for finding reviews on domain registrars? No
A simple attempt to earn affiliate commissions with a review site? Yes -- this concept has been around forever in the affiliate world.
Not sure why this link is on hacker news...
My personal advice would be to spend your personal time on something actually productive.
Be respectful. Anyone sharing work is making a contribution, however modest.
Ask questions out of curiosity. Don't cross-examine.
Instead of "you're doing it wrong", suggest alternatives.
When someone is learning, help them learn more.
When something isn't good, you needn't pretend that it is.
But don't be gratuitously negative.
Relevant to who/what? Netflix didn't write the article.
It depends who's perspective you're looking at. The article could have been titled "Yahoo's new CIO accused of accepting kickbacks at former employer".
That was my attempt at a joke. But seriously, I love python when it comes to manipulating data and doing anything statistical. You've got numpy, scipy, scikit-learn, etc.
As a student of statistics, I'm kind of split on R. On one hand, it's just not a very well-designed language. The fact that it has three (!) independent object systems is a testament to this. On the other hand, as vegabook also mentions, working with vectors and matrices is just a lot more natural in R than in general-purpose languages like Python, because R's syntax has been built from the ground up to work with the kind of structures you usually work with in science.
I'm hoping Julia might become a good alternative to R and Python, but I can't see it catching on in the statistical community anytime soon given how many people are still using relics like SAS and Stata. The raw fact is that statisticians (considered as a group) just aren't very good at programming (and many older statisticians can't program at all), which means that a well-designed programming language may not necessarily be easy to use for a member of the statistical community used to point-and-click statistics suites.
I think R is a well-designed language. It definitely has its quirks (what language doesn't?), but by-and-large they are problems with the standard library, not the language. This is admittedly a subtle distinction, but it's much easier to fix problems with the standard library than it is with the language.
Three aspects of the language that make R particularly well suited for statistical programming are:
1) Missing values built in at a fundamental level.
2) Metaprogramming capabilities. The best way to solve many categories of data analysis problems is to design a domain specific language which allows you to easily combine independent pieces. R's incredible flexibility is great for this.
3) Fundamentally vectorised and functional. This allows you to elegantly express many common data analysis tasks.
How do you feel about reproducible computing in python? R is very well set up to A) get it running on any platform easily B) report the crucial parts of the environment. I know that if I grab someone else's (published) code written in R, I'm pretty confident I can make it work. Part of this is the great package management through CRAN or Bioconductor, and also because often important reference data for bioinformatics is actually available through the package manager.
I haven't done much with Python, but I don't quite get the same feeling (happy to be told that the reality is otherwise!). For example, the opening line of the installation guide for Pandas doesn't inspire great confidence in me: "The easiest way for the majority of users to install pandas is to install it as part of the Anaconda distribution, a cross platform distribution for data analysis and scientific computing."[1] Do I really need to install the HDF5 package so I can split a concatenated variable into two columns??
The thing w/ reproducible research (I was an early BioC core member and have worked directly w/ its RR advocates) is that it requires having an exact set of R and packages. I know that BioC tries to do this (I wrote the original BioC package download script) but weird things can still happen. A few years ago I was tracing down a bug in some computational biologist's code that really traced down to some wacky version of a particular package which might be downloaded in the right circumstances.
In a previous life what we did was for every project you'd download a snapshot of an R environment, including all packages. That, and only that, was used for all computation for everything involving that project from start to finish. If Docker was around at the time, that's what we'd have used.
Thanks for your work with BioC, it's fantastic.
I use it a lot in my cancer genomics research. Part of that involves providing a service to patients living with cancer, so your work is definitely out there having an impact!
Thanks but I haven't been a contributor for a decade, I just had a hand in the early days. I agree though that it's a phenomenal suite for the bioinformatics world and an exemplar of proper R techniques
Python's pip is pretty good though not quite as polished as CRAN. I have had few problems running complex code from third party sources, though one always has to be aware of the Python 2 v 3 "problem" (though it is diminishing now with most things available on 3). If you get pip up and running on a new Python installation you can avoid Anaconda/Canopy if you want a clean installation, and I have installed fairly complex Python setups in multiple locations without too much trouble. Let's be fair, R can also be tough if it calls a lot of third party libraries. Just try to get rJava working properly for example if the local R and Java installations are not both 32 or 64 bit. It can be a complete mess to disentangle this sort of stuff in R. Or for example running code that uses Cairo, on a mac. My experience is that Python's poor package management reputation is not really deserved anymore. Python's virtualenv also allows you hermetically to seal away an entire python environment, including its libraries, so that it will not conflict with other python environments that might have different versions of the interpreter and/or libraries. I am not aware of anything this robust in R.
Reproducible computing? The ipython notebook is awesome, though I am not sure if there is anything as good as knitr if your workflow is LaTeX oriented.
R "hands" will usually find Python a backward step when it comes to vectorized data manipulation, but its a forward leap if your data becomes too big or if you have to step out of the comfy environment of exploratory analysis into any form of (even trivial) production settings.
And no you definitely do not need HDF5 to effectively use Pandas.
The closest equivalent to virtualenv for R is packrat: http://rstudio.github.io/packrat/. It doesn't (yet) support different R versions for different projects, but that's on the roadmap.
Ok that's good to know. Sure, R breaks inexplicably sometimes due to dependencies, no doubt about that.
virtualenv sounds useful. Is it used much when python code is published in a paper?
About HDF5: I was just making the point that the Pandas docs recommend I install Anaconda to get Pandas, thus also installing HDF5. I am sure there are other ways, but the way the documentation is phrased suggests that these other ways are overly difficult.
You are strongly encouraged by the Python powers that be to move to 3, and I have only in the past few months begun to agree with them, and that is because some serious standard libraries like asyncio are now only available on 3. It's (finally) the future. However a big caveat is that if you're learning Python, most of the sample code you will find on the web will be 2-based and will not work well under 3. It's not so much the print statement, but range() works subtly differently too now (return a generator not a list - too subtle for beginners to properly understand in my view) and unicode strings can break older code too. Just be aware of these things and move to 3 is my (51/49) advice, but this is a controversial point and others will have differing points of view.
I find knitr easy to use. They way it generates graphs and can output to pdf/html is really useful and is reproducible and easily shared. While essentially just markdown + R code the code can point to data sets instead of having it embedded. It has a good set of graphing libraries (ggplot2, etc) too. I can see how this could be the killer app that gets social science research papers written and produced in knitr. I always thought IPython would take this crown but R/knitr is looking good. Have not used Shiny yet
You don't have to install the entirety of anaconda. You can install miniconda (from here: http://conda.pydata.org/miniconda.html) and then do `conda install $package_name`
or, if like me you like to create separate environments for separate projects... `conda create --name $environment_name python; source activate $environment_name; conda install $package_name`
disclosure: I work on miniconda. I'm currently working on improving our developer experience. Complaints are welcome.
yes I have moved (back) to Python mainly because R is too slow when we get beyond a certain data size and the language is not powerful enough when data starts having to be moved around at scale. I have a 5-10 times speed improvement in native Python and another 30x more if I can vectorize things in Numpy. However a huge caveat is that R is much more succinct when it comes to exploratory analysis during what I call the "data rotation" phase because its vectorized nature is so much more efficient at selecting, reducing, cleaning and rotating data, than even Pandas can manage. It's irritating having to write list comprehensions constantly for what would often have been a ridiculously direct and efficient vectorized command in R. Moreover R's graphics leave matplotlib in the dust, though this advantage is eroding with the JS libraries taking over.
The other area where Python crushes R is if your data is live streaming. Here you inevitably need a full fledged programming language with proper asynchronous io capabilities and multithreading / multiprocessing that is not batch oriented.
Totally agreed. I do model analysis on data sets with 200k-5m rows and anywhere from 500 to 20k columns. I originally started doing my work in R, but about two years ago, python started improving rapidly for heavy data analysis, and at the moment I'd say it's a clear winner.
For that kind of data or larger, I would avoid R and Python and move to writing my own algorithms or try out something for more heavy duty analysis such as Mahout or Spark. R and Python are still one box and memory constrained.
I know we don't reward snarky humor 'round these parts, but I was about to say the same thing. Python seems to own this space and the ecosystem around Python and math/stats/analysis is exceptionally healthy. If there's a specific place where R kicks ass please speak up -- it's fallen off my radar.
I use both Python and R a fair bit. As a language, absolutely I prefer Python to R. However, I think there are two areas where R is better than Python and together, I think they add up to a durable advantage, at least for stats people.
1) Package support. Yes, Pandas and scikit-learn are good, but R still has a definite edge here. Here are three things I've needed lately where R has hands-down better code available: forecasting, frequent itemset mining, and network community detection.
2) Non-programming uses. There are a lot of tasks where you need a computer, but just to do one thing, a plot, calculate a statistic, ... stuff like that. R is better in that use case.
R is in some ways more forgiving to newcomers. Sure, there's all sorts of weirdness around how vectors and matrices work, and don't get me started on the cryptic function naming, but (1) almost all batteries are included -- hardly ever a need to hunt around for packages, (2) RStudio is really nice, with graphics, a shell, a text editor, documentation etc. all in one place, (3) it's mature and well-tested.
I prefer Python myself, but after spending a couple of months with R I do understand why people like it.
(OTOH I'll be a happy person if I never ever have to work with SAS ever again.)
Oops! sorry sorry,... really sorry, apologies for snorting coffee over you, but given multiple years of experience TA'ing for machine learning / datmining courses I couldnt disagree more. R had them in absolute knots, and yeah they were asked to use RStudio if that helped. They struggled with simple things such as writing a naive Bayes classifier. Most of their mistakes were because of R's weird and silent inconsistencies: scalar or vector, copy or reference.
It is possible that all these 30 odd students every year were stupid but chances are fairly low.
EDIT:
The course has since switched to Java (Knime) and Python and that has gone a whole lot smoother.
Neither Java nor Python are my most favorite languages, but have to concede that Python is massively more consistent than R, so a student has to remember less of special cases, and the whipping boy of dearth of packages seemed less real at least in the context of the course. At least in the academic setting enthought / canopy / anaconda does a marvelous job of it.
I said more forgiving. It's certainly not a forgiving language or ecosystem in absolute terms, you're right on the mark there. But ultimately you have to pick your poison. Do you want to struggle with all of the various quirks of R or do you want to struggle with all of the various quirks of (data analysis in) Python?
Would google publish data that shows how searches for porn spike during different times of the day or times of the year, as if it's some "cool and hip and edgy!" insight?
I don't think so.
And for the same reason they don't (whatever reason that is), it would probably also be wise for Uber not to post stuff like this.
I really don't care, nor am I offended. I'm just speculating that Uber doesn't have the brightest team of execs and still have a lot of "growing up" to do.
Google have been fighting a public relations war for a long time now to not appear creepy or stalkerish. I can think of few things they could blog about to make people consider not using Google more than "we know when you're looking for porn".
Uber have not (yet?) been widely called out as being creepy the way Google have. But Uber have data that can be every bit as personal as your search history, and posts like these make it obvious that people at Uber are thinking hard about putting those data to use.
There's a lot lurking under what at first glance appears to be merely a poorly-considered sophomorish post.
OKCupid is a dating website which deliberately branded themselves as further on the "edgy" and "hookup" side of dating websites. Then you have POF somewhere in the middle, with eHarmony way on the other side, quite opposite of OKCupid.
I'm not sure why Uber would want to put themselves anywhere on that same scale (i.e. aligning your brand with notions of sex and one night stands). There's a time and a place for everything, and for edgy data analysis like this -- that "place" is edgy dating websites who want to be known for hooking up.
It's unprofessional and out of line with their brand image, obviously why the post got deleted. IMO this further validates all the bad press the media has been publishing about Uber.
Imagine if Uber CEO made a comment like Elon Musk did about the "D" in P85D and having "velcro on the sides of his pants." The media would be ALL OVER that.
Based on what I've read, Uber CEO sounds like a douchebag. But I'm not really sure, never met the guy or seen him in person, it's all from stuff read online.
So all you'd see from me is encrypted stuff being sent to a random IP address.