More

eightnoteight · on June 1, 2024

> The problem is that sentinel errors, as typically and idiomatically used, in fact are special, and are more expensive to deal with than other values. My suggestion to use boolean values outperforms them by a lot, 30x in fairly common idiomatic usage.

while I agree to some degree, but when performance comes into picture, what really matters more is normal path vs surprise path rather than happy path vs error path

it's hard to argue what is a happy path in the code, but it's not wrong to say that io.EOF check is a normal path of the code i.e. not a surprise in production. the bad performance of errors.Is is something to be improved upon but it's not a surprise in production when there are a large number of `errors.Is` checks during normal path of the code

now coming to the surprise path of the code, here's where performance gets really important, no one wants their code to suddenly hog 100% CPU because of some special error case - https://blog.cloudflare.com/cloudflare-outage . but such surprise paths often contain a large amount of business logic that weigh much more than how slow errors.Is function is compared to a boolean check

it would be interesting to see where this line of reasoning is valid but IMO performance isn't a good argument against why errors are not normal outcomes of operations in production

but thumbs up for the article, now I know what to reference for backing the below pattern that I often use, when I first saw the errors.Is it was pretty obvious that its going to be slow but just didn't have time to prove it and use below pattern

```

if err != nil && errors.Is(err, x) {

} else if err != nil && errors.Is(err, y) {

} else if err != nil {

    // handling unknown error

}

```

eightnoteight · on March 28, 2024

people who need to monitor a lot of channels are usually in senior/leadership layer, but one technique usually they follow is focus on a specific problem and consequently some set of specific channels for few weeks or a month and shift focus as the project/task changes

how are you thinking about capturing such dynamic decisions to choose focus area, happening outside the communication tool - like zoom or meetings etc,... algorithm can be real-time but even with data points from meetings etc,... can it be made in such a real-time?

instagram feed algo is pretty real-time but the number of unique behaviours or behaviours to people ratio is quite low. but I'm guessing in a work environment that ratio or the unique behaviours would be too high for the algo to react quickly, right?

eightnoteight · on March 27, 2024

controlling the producer is such a hard problem, even with exponential back off and backoff times in the response headers, you still get at minimum 2x throughput increase from the producers during a retry storm

problem is that the most common backpressure techniques like exponential back-off and sending a retry-after time in the response header have constraints on maximum backoff time they can do, in some scenarios that is much much less than the normal.

for example, imagine a scenario where a customer explores 10 items on Amazon, and then finally places an order, so 10rps for the product page and 1 rps for the order page. if order services goes down, slowly the customers get stuck on the order page and even with backpressure, your RPS keeps on growing on the order page. exponential backoff doesn't help as well

while dropping requests is a good idea, but that action is not designed by default every time, systems go into metastable state and you need the ability to control the throughput on producer side

you could solve it by keeping a different layer in between like load balancer or some gateway layer that is resilient against such throughput spikes and will let you control throughput on your service and slowly scale up the throughput as per your requirements (by user or by random)

for frontend devices, it gets exponentially harder to control the throughput. having an independent config API that can control the throughput is the best solution that I came across

eightnoteight · on March 26, 2024

I think of it as different companies helping at different stages of mainstream adoption

for any open source project, the initial adoption during its initial stages would have been that project itself, later on it will be someone with financial incentive to drive adoption but that would still be a niche market and as soon as it starts to show potential aws simply copies and gives it as an offering

all 3 help with the adoption, whether we like it or not, aws has much larger distribution channel and people would rather just use one of the existing vendor than buy a new Nth vendor

eightnoteight · on March 25, 2024

> Nowadays I open Aegis and I have > 20 services there, and trying to look for my code between all the running numbers is a pain.

exactly :(

I wish passkeys get rolled out quickly across all sites, most people use just 2 or 3 trusted devices 99% of the time.

for those edge cases where you are working on an untrusted device, the passkey on your trusted mobile can help with authentication via Bluetooth or some QR code etc,...

eightnoteight · on March 19, 2024

> “Oscar, do you mind sharing your screen so Deepak and Deanna can see the weird log messages too?”

it seems so obvious from an Incident Commander perspective but so much goes into this workflow during an incident

* what if the person is a fresher, you are asking him to share screen, debug and perform actions in front of 100 people in the incident call and the anxiety that comes with it

* While IC has much more practice with handling fires continuously, for instance, if there is a fire every week in a 50-team organisation, a specific team would only be seeing their first incident once a year

* Self-consciousness/awareness instantly triggers a flight or fight response from even the most experienced folks

I don't know how other industries handle such a thing, I'm pretty sure even in non-tech there would be a hierarchy for the anomaly response and sometimes leaf level teams might be called to answer questions at top level of the incident response (like a forest fire response, might have a state wide response team and they pulling local response team and making them answer questions) probably they get much more time to prepare than in tech where its a matter of minutes

chris_wot · on March 19, 2024

In a previous job, I had a critical incident crop up and we were dealing with the offshore parent company. All the senior management had been cc’ed into the emails about the problem.

Result: nobody was willing to say anything for fear of looking bad in front of those people. This was frustrating to say the least.

I solved this by replying all, but I took out all the senior people. I said something along the lines “hey guys, I’m the guy who needs this fixed. I can see you are la working hard. I’m removing a number of people from the cc lost and we will communicate with them in a seperate email. Just keep me up to date with how it’s going and tell me what you need from my end.”

This worked wonders. They worked the issue, and though it took some time it was to be expected.

When it was solved, I found the original email, replied all (including management) and explained that the problem was solved, and made a point of highlighting the excellent work the team fixing the problem had done on resolving the issue.

I never had any issues with the patent company’s dev team after that :-) in fact, they went through our incident reports and fixed 80% of the longstanding issues within the next week! Which I wasn’t expecting…

Moral of the story - take as much pressure off the incident team as you can.

ChrisMarshallNY · on March 19, 2024

Thanks so much, that was good, practical, wise, in-real-life experience.

throwanem · on March 19, 2024

> 100 people in the incident call

Well, there's your first problem...

eightnoteight · on March 19, 2024

I took a high enough number to showcase the problem, for a fresher it doesn't change much even if that number is as low as 15 or 20, or even if 5 people that they don't know or at higher levels

also I feel like, the number of people that hop on the incident call are almost always related to the category of the incident, sure you can always break out to a separate room, but often the person would have already realised the impact and the weight of the incident

throwanem · on March 19, 2024

And the point is that both of these are problems that an incident commander is there in part to solve, both in the sense of making sure that those investigating have what they need including the ability to focus, and in that of handling communications with stakeholders including leadership.

If whoever feels like it can "hop on" the incident call and stay on it, regardless of whether or not they can contribute to the investigation, then the IC needs to do a better job. Granted, usually this is for lack of institutional competence; I've been one place where the IC role was taken seriously, and incident response there ranged from solid to legendary, where most places never rise above "cautionary tale." But nonetheless.

sumtechguy · on March 19, 2024

In my exp people will get pulled in then never let go for the rest of the incident. The coordinator needs to be 'do we need XYZ anymore if not they can go and we can call them back if needed'. That is how you end up with 30+ people on a call. Not letting anyone go. Dont hold them hostage.

mlrtime · on March 19, 2024

Can you comment on why you think it is a issue for anyone to hop on a incident call, whether or not they can contribute?

It is one thing if they are being disruptive, but I don't see a problem with observers.

For this thread, the fact that some people may feel scared to share a screen or participate if the group is too large, again that is for the IC to control. But I wouldn't kick anyone else just for lurking, there may be a good reason and I'm not going to call out every one on the call asking why they are there, that is just as disrupting.

TIA

throwanem · on March 19, 2024

An ongoing major incident is already stressful enough for everyone involved, and looky-loos don't help that at all. Nobody does a better job of debugging for having to fight a helmet fire at the same time, and one of the IC role's responsibilities is to proactively minimize that risk as far as possible.

It does depend somewhat on the situation and the organization, and on the role; IC engineers observing for familiarization is fine, VPs joining never is. My approach is that the incident call is for those actively involved in the investigation or who have been invited to join by those who are, including engineering ICs who wish to observe for familiarization. Meanwhile, stakeholders not directly participating in response receive updates from the incident commander via a separate (usually Slack) channel. Managing that communication is also part of the IC role, whether directly or by delegation.

lamontcg · on March 19, 2024

I've been on an incident call that Jeff Bezos hopped on to listen into. The "IC" (we had some different name like problem management engineer or something like that) did not ask him to get off it.

throwanem · on March 19, 2024

This makes sense. Amazon's corporate culture is famous for its deficits.

ttymck · on March 19, 2024

Surely you'd want to instead share a link to the logs being investigated so others can investigate concurrently, instead of having 2 backseat drivers observe someone observing logs.

brazzy · on March 19, 2024

Depends. In some situations it would in fact be better to have everyone discuss one person's shared screen, instead of having to constantly coordinate what they are talking about.

mlrtime · on March 19, 2024

+1 Depending on how complex the system/tooling is, it is rarely just one log file to share in a text editor.

If you have logs, metrics, tracing, other dashboards for context you want to see how they are debugging.

Some of these tools are very complex and other eyes can help pinpoint inefficiencies.

c6400sc · on March 19, 2024

Ideally, wouldn't the IC's / Group of ICs' responsibility to introduce blameless culture before the incident, right?

I've worked in blameful places, always without ICs; just shouting HIPPOs.

I hope that an org evolved enough to create IC roles would back that up with culture, but I could be wrong.

pjc50 · on March 19, 2024

Indeed - in that kind of environment an important role is "managing upwards", preventing the people who are actually doing the work from being overwhelmed by constant requests for status and explanations.

wil421 · on March 19, 2024

What is a fresher?

hibbelig · on March 19, 2024

Recently graduated, just entered the workforce.

mlrtime · on March 19, 2024

Fresher is not a good term for this example.

There are engineers that are great coders but bad in a incident environment. They may not be fresh, but also need the same help as a "fresher"

Twirrim · on March 19, 2024

It's a very US centric term, in the UK we'd just call them graduates, for example.

lolinder · on March 19, 2024

Nope, not a US term. I've found it in a couple dictionaries as a UK term for "freshman", which is a similar idea but not quite the usage in OP.

The equivalent that I've usually heard in the US is "recent graduate", rather than just "graduate".

https://dictionary.cambridge.org/us/dictionary/english/fresh...

dpcx · on March 19, 2024

As a US developer for nearly 25 years, I've never heard this term used in business context. I'd call them a graduate as well.

stcroixx · on March 19, 2024

Recent (this generation) Indian immigrants to the US use the term in my experience. I've never heard anyone else say it.

tetromino_ · on March 19, 2024

It's mostly a South Asian centric term.

dagw · on March 19, 2024

It's a very US centric term

You've never heard of "freshers week"? That being said, I've never heard the term used to refer to anything other than university students.

wil421 · on March 19, 2024

I live in the US and have never heard of it.

red-iron-pine · on March 19, 2024

not a US term. SE Asian.

"Fresher" + "100 people on the call" immediately makes me think Tata or Cognizant.

eightnoteight · on March 18, 2024

websockets and sse are a big headache to manage at scale, especially backend, requires special observability, if not implemented really carefully on mobile devices its a nightmare to debug on frontend side

devices switch off network or slow down etc,... for battery conservation, or when you don't explicitly do the I/O using a dedicated API for it.

new connection setup is a costly operation, the server has to store the state somewhere and when this stateful layer faces any issue, clients keep retrying and timing out. forever stuck on performing this costly operation. it's not like there is an easy way to control the throughput and slowly put the load on database

reliability wise long polling is the best one IME, if event based flow is really important, even then its better to have a 2 layer backend, where frontend does long polling on the 1st layer which then subscribes to websockets to the 2nd layer backend. much better control in terms of reliability

debarshri · on March 18, 2024

I cannot agree more with you. I have seen people shot themselves on foot with Websockets and SSE. Long Polling even though is expensive, is it most explainable and scalable approach in my opinion.

pornel · on March 19, 2024

SSE supports long polling. You can make the server close the connection whenever you want. SSE supports automatic reconnection, and will even include the last ID seen to let the server continue seamlessly.

hobobaggins · on March 19, 2024

It's important to remember that SSE won't automatically reconnect for quite a few HTTP status codes (i.e., upstream proxy outages like 50x error codes)

nchmy · on March 19, 2024

A lot of this was addressed in the linked article - rxdb has mechanisms to mitigate many of your concerns...

eightnoteight · on March 18, 2024

I never focussed much on sleeping postures, but one day I read this article about how acid reflux goes away if you side-sleep on your left hand side i.e stomach is at a lower height than when you sleep on your right hand side

that really changed my life, it was like, how did I waste 28 years of my life without finding this trick :D

polishdude20 · on March 19, 2024

I actually noticed when I sleep on my left side, I start to feel out of breath. Like ,I can breathe normally but something makes me feel I need to take deeper breathes almost as if some blood flow is getting cut off.

eightnoteight · on March 6, 2024

> here they explain why they had to betray their core mission. But they don't refute that they did betray it.

you are assuming that their core mission is to "Build an AGI that can help humanity for free and as a non-profit", the way their thinking seems to be is "Build an AGI that can help humanity for free"

they figured it was impossible to achieve their core mission by doing it in a non-profit way, so they went with the for-profit route but still stayed with the mission to offer it for free once the AGI is achieved

Several non-profits sell products to further increase their non-profit scale, would it be okay for OpenAI non-profits to sell products that came in the process of developing AGI so that they can keep working on building their AGI? museums sell stuff to continue to exist so that they can continue to build on their mission, same for many other non-profits. the OpenAI structure just seems to take a rather new version of that approach by getting venture capital (due to their capital requirements)

drcode · on March 6, 2024

The problem of course is that they frequently go back on their promises (see they changes in their usage guidelines regarding military projects) so excuse me if I don't believe them when they say they'll voluntarily give away their AGI tech for the greater good of humanity

ethbr1 · on March 6, 2024

Wholeheartedly agreed.

The easiest way to cut through corporate BS is to find distinguishing characteristics of the contrary motivation. In this case:

OpenAI says: To deliver AI for the good of all humanity, it needs the resources to compete with hyperscale competitors, so it needs to sell extremely profitable services.

Contrary motivation: OpenAI wants to sell extremely profitable services to make money, and it wants to control cutting edge AI to make even more money.

What distinguishing characteristics exist between the two motivations?

Because from where I'm sitting, it's a coin flip as to which one is more likely.

Add in the facts that (a) there's a lot of money on the table & (b) Sam Altman has a demonstrated propensity for throwing people under the bus when there's profit in it for himself, and I don't feel comfortable betting on OpenAI's altruism.

PS: Also, when did it become acceptable for a professional fucking company to publicly post emails in response to a lawsuit? That's trashy and smacks of response plan set up and ready to go.

KoolKat23 · on March 6, 2024

There is no fixed point at which you can say it achieves AGI (artificial general intelligence) it's a spectrum. Who decides when they've reached that point as they can always go further.

If this is the case, then they should be more open with their older models such as 3.5, I'm very sure industry insiders actually building these already know the fundamentals of how it works.

HarHarVeryFunny · on March 6, 2024

An interesting aspect of OpenAI's agreement with Microsft is that, until the point of AGI, Microsoft have IP rights to the tech. I'm not sure exactly what's included in that agreement (model, weights, training data, dev tools?), but it's enough that Nadella at least made brave sounding statements during OpenAI's near implosion that "they had everything" and would not be disrupted if OpenAI were to disappear overnight. I would guess they might have a major disruption in continuing development, but I guess at least the right to carry on using what they've already got access to.

The interesting part of this is that whatever rights Microsoft has do not extend to any OpenAI model/software that is deemed to be AGI, and it seems they must therefore have agreed how this would be determined, which would be interesting to know!

There was a recent interview of Shane Legg (DeepMind co-founder) by Dwarkesh Patel where he gave his own very common sense definition of AGI as being specifically human-level AI, with the emphasis on general. His test for AGI would be to have a diverse suite of human level cognitive tasks (covering the spectrum of human ability), with any system that could pass these tests then being subject to ad hoc additional testing. Any system that not only passed the test suite but also performed at human level on any further challenge tasks might then reasonably be considered to have achieved AGI (per this definition).

mikkom · on March 6, 2024

> still stayed with the mission to offer it for free once the AGI is achieved

And based on how they have acted in the past, how much do you trust they will act as they now say when/if they achieve AGI?

eightnoteight · on March 1, 2024

once it converts into profit-seeking venture, it won't get the tax benefits

one could argue that they did R&D as a non-profit and now converted to for-profit to avoid paying taxes, but until last year R&D already got tax benefits to even for-profit venture

so there really is no tax-advantage of converting a non-profit to for-profit

andrewflnr · on March 1, 2024

But it keeps the intangible benefits it accrued by being ostensibly non-profit, and that can easily be worth the money paid in taxes.

Otherwise, why do you think OpenAI is doing it?

eightnoteight · on March 1, 2024

> it keeps the intangible benefits it accrued by being ostensibly non-profit

but there would be no different to a for-profit entity right? i.e even for-profit entities get tax benefits if they convert their profits to intangibles

this is my thinking. Open AI non-profit gets donations, uses those donations to make a profit, converts this profit to intangibles to avoid paying taxes, and pumps these intangibles into the for-profit entity. based on your hypothesis open ai avoided taxes

but the same thing in a for-profit entity also avoids taxes, i.e for-profit entity uses investment to make a profit, converts this profit to intangibles to avoid paying taxes.

so I'm trying to understand how Open AI found a loop hole where if it went via the for-profit then it wouldn't have gotten the tax advantages it got from non-profit route

andrewflnr · on March 1, 2024

Maybe we're using different definitions of "intangible", but if you can "convert" them to/from profits they're not intangible in my book. I'm thinking donated effort, people they recruited who wouldn't have signed up if then company was for-profit, mainly goodwill related stuff.

whimsicalism · on March 1, 2024

this long period of OAI non-profit status when they were making no money and spending tons on capital expenditures would not be taxable anyways.

whimsicalism · on March 1, 2024

What benefits? What taxes?

Honestly it does not sound like anyone here knows the first thing about non-profits.

OAI did it because they want to raise capital so they can fund more towards building agi.

svnt · on March 1, 2024

The tax advantage still exists for the investors.

eightnoteight · on March 1, 2024

I don't believe non-profits can have investors, only donors i.e an investor by definition expects money out of his investment which he can never get out of a non-profit

only the for-profit entity of the OpenAI can have investors, who don't get any tax advantage when they eventually want to cash out