Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is good. I've had problems that were somewhat related to what the author talks about.

When I was learning C# and was already quite fluent in C/C++. I had a big problem with the C# type system/management. I'd been reading guides that were in the first category the author mentions, eg. "not really a beginner, but new to this language".

I was trying to retrieve the bytes that a certain string represented. I was looking for ages and everywhere everyone mentioned that "this shouldn't be done", "just use the string", etc. A stack overflow answer mentions a way to use an 'encoding' to get the bytes and this seemed to be the only way.

How strange I thought, I just want access to a pointer to that value, why do I have to jump through all these hoops. None of the guides I was reading provided an answer, until I found a _real_ beginners book. This book, helpfully starting at the real beginning of every language: the type system, finally gave me the answer I was looking for:

.net stores/handles all strings by encoding them with a default encoding. It turned out that the whole notion of 'strings are only bytes' that I carried over from C++ does not work in C#. All those other helpful guides gleefully glossed over this, and started right in at lambdas and integration with various core libraries. Instead of focusing at the basics first.



This has nothing to do with learning a programming language and everything to do with learning how to process text in a computer. Being a C programmer doesn't mean you have only a PDP-11's understanding of text ("it can be ASCII or EBCDIC, and I know how to convert between the two!").

When I learned C# (in 2003?), I learned that String in an array of Char and Char is 16 bit, and that .NET used the same encoding as Windows NT (UTF-16 in native endian).

I knew that both WinNT and Java made the mistake of being designed at a time when people assumed 16 bits are enough and consequently caused the surrogate pairs mess. I knew that Java assumes UTF-16BE and Windows assumes UTF-16LE. I knew what UTF-16 means in C/C++ and how to work with it or transform such data to and from UTF-8 and UCS-4.

When learning a new programming language, I know to look up whether strings are mutable and whether they're sequences of bytes, code units or code points. If they's immutable, I look up how to create a copy instead of a reference when using substring and when they're not bytearrays I look up how real bytearrays are called in this language.

Should early programmers be taught this? Absolutely. At what stage? I don't know. But they must be taught from the start that this has nothing to do with a programming language and everything to do with how data is represented in memory.


I'd actually consider that kind of knowledge pretty advanced. Beginners (and early up to even junior coders) usually don't know much about the internals of their environment; they just use stuff.

I'm always interested in the internals; but it's often surprisingly hard to find information on the internals. There are few books, and you'll often need to read lots of source code and specifications and reverse engineer things to find out how stuff works under the hood.


> I'd actually consider that kind of knowledge pretty advanced.

For someone from a C background, that's not advanced: it's simply what strings are. The whole idea that characters aren't bytes may be very strange to someone who's only ever done C and C++. It's probably just as strange to them as the idea that there's any relationship between bytes and "the characters that make up a piece of text" is to someone entirely new to programming.

In a related anecdote, I was once in a room with 4 C programmers and a Haskell programmer who was trying to write his first C program. The Haskell guy asked "hm ok so how can I compare two functions to see if they're the same?" and after a 20 minute discussion the C guys still couldn't understand why you would possibly want that and the Haskell guy still didn't know how to continue (I was one of the C guys). All had many years of programming experience, but the frames of reference were simply so different.

I think it's smart of Zed to confront people with multiple programming languages from the beginning, so that this kind of issue thing never really becomes a problem.


You are almost certainly misstating the Haskell programmer's question, because C makes it very easy to test if two function pointers are equal (intensional equality) whereas Haskell makes it very hard.


I think they might have meant "whether two functions are structurally identical"—i.e. whether their post-link-load-phase object-code hashes the same, presuming they're both position-independent.


I'd have to disagree, but maybe I'm not getting your point.

It's more work in C and C++ (and a lot of other languages) to treat strings correctly, but unless you're talking about a C programmer who's been living under a rock for 20 years, most of them are familiar with the issues around Unicode, UTF-*, etc., and they choose to ignore the issue when they can get away with it. When it's important, there are libraries like iconv and ICU for handling it. C++ even has some character conversion builtin to the locale system, but it's super ugly (which goes without saying, because almost everything in C++ is ugly ;-)

As far as your anecdote goes, I know both C and Haskell, and the question doesn't make any sense to me either. It's provably impossible to compare two functions for equality. Even in Haskell, function types don't derive from the Eq type class, so it wouldn't be possible.


In Haskell, the facetious answer is simple:

    No instance for (Eq (a0 -> a0)) arising from a use of `=='
That is, functions aren't comparable (for equality, anyway), so the type system won't allow you to compare them.

The better answer is either "Look at their type signatures" or "See if they evaluate to the same values when given the same input"; the first is trivial, the second won't, in general, terminate, so you need a more nuanced conception of "equality" to apply in this instance. This is non-trivial to come up with.

Kent Pitman has an interesting essay on this problem from a Common Lisp perspective: "The Best of Intentions: EQUAL Rights—and Wrongs—in Lisp"

http://www.nhplace.com/kent/PS/EQUAL.html


Why the down votes?


And the name of the book???


That's only "basics" if you've got the wrong idea. There are millions of possible mistakes, no beginners' guide can explicitly address every one. People told you to just use the string - wasn't that a good enough answer?


> People told you to just use the string - wasn't that a good enough answer?

"Don't do that" isn't a sufficient answer without explaining exactly why, though. And if you aren't asking the right question, then the explanation might even seem obtuse.


I can relate, in this case the op was actually hindered because he knew there are bytes behind the string. A key insight into why you shouldn't simply get the bytes is the .net char size is actually 2 byte, and the internal encoding is utf-16. Thus encoding/decoding is required, and if you haven't worked with encoding before, can be a bit confusing imo.


"Don't do that" is the right answer when you're asking the wrong question. It's an invitation to take a step back and ask how to do what you actually want to do, at a higher level.


There is nothing inviting about someone saying "don't do that" to you. If you actually want to understand a beginner's intentions, you're a lot better off asking them what their goal is.

In my experience mentoring new developers, it's much more helpful to ask "what is your goal?" instead of "don't do that."


No; the correct answer in that case is "Why are you trying to do that?"


That's what pedants do.

When I was a kid learning line number BASIC, adults answered my questions knowing that I'd figure out The Right Way before anyone hired me to write the code for radiation treatment devices.

The tech community's obsession with "The Five Why's" is toxic. When asking questions, you always have to first prove that you deserve an answer. It becomes a process of trying to anticipate any potential reason someone might have to argue "You're doing it wrong" - and preempt that. You can't just ask a question: you have to both ask and justify.

Mostly I just don't bother, and I suspect that I'm not alone. And I have a degree and industry experience. It must be incredibly frustrating and discouraging for beginners.


Dunno. If someone asked me how to get the bytes of a string, I would ask why. Not because theres The Right Way to do things, but because they might be doing things The Hard Way.

A why can reduce the amount of code written by 100%.


Sure, but when someone is first learning, frequently the true answer to "why are you doing that" is "to see what happens" (even if they have some flimsy justification within their pet-project at the time.) Giving them the answer lets them go back to experimenting so they can see, for themselves, why the path they're heading down might not be such a good idea. Formative experiences and such.


Yeah, that's a fine answer. But they might just not know of the other way to do things.

Like, in Java, for a long time I didn't know there was an output stream that you could write a string directly, so I was always getting the bytes to write it. I wouldn't call that a formative experience.


No, because they haven't explained WHY.

This ties into another of my pet-peeves: people who answer my "how do I do this" question with "don't do that". Quite often I have very good reason for wanting to do that (getting around another bug, esoteric requirements, etc). Whenever I answer a question, I'll first tell them exactly how to do what they're asking for, and THEN explain why doing that is usually a bad idea, and THEN show some alternatives that will probably do what they want.


>People told you to just use the string - wasn't that a good enough answer?

http://4.bp.blogspot.com/-KzOtz-8coJU/Uga2fN7SNmI/AAAAAAAAKI...

> > I was trying to retrieve the bytes that a certain string represented.


The above link is to a Hyperbole and a Half cartoon image of a person and the words "No, see, that solution is for a different problem than the one I have."

Blind links aren't polite. :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: