MyGet Blog

Package management made easier!

NAVIGATION - SEARCH

Feature highlight: MyGet Gallery

It's been less than two months since we introduced the public MyGet Gallery, and we are very happy to have some awesome projects onboard, often shipping development builds or using MyGet as a staging environment before pushing packages upstream. These feeds deserve more attention and people should know these feeds exist.

Did you know you can get the latest development builds of SignalR and MVC Contrib delivered through MyGet?

If you think your project's development builds deserve a place in our Gallery, you can easily opt-in using your feed's Gallery Settings.

Gallery feed readme

Our latest v1.3 release now allows feed owners to add a feed readme for publication in the MyGet Gallery. This way, feed owners now can easily add some instructions or disclaimers to the gallery feed details page using the well-known markdown format. Don't forget to add a link to your project page and/or sources as well.

You might recognize the pagedown markdown editor there, which is also used on the StackExchange Web sites, so you are probably already familiar with it. In return for this excellent control, we also added the StackOverflow identity provider to our login page. People who prefer to login using their StackOverflow account can now do so, or you can link it to your other identities within your MyGet account as well.

Layout improvements

While at it and based on the invaluable feedback of our users, we also improved the look-n-feel of the packages and feed listings by removing clutter and putting focus on content.

You can now not only see which packages are listed, and what version they're at, but you can also track when this version has become available. A simple click on the icon next to the package will trigger a package download in your browser, whilst the icons on the top right bring you to the SymbolSource symbols repository of this feed (if available) or give access to the RSS feed associated with this package repository.

What's next?

The screenshot below (taken from our latest development builds) is giving you a glimpse at what's coming next: user profiles and activity streams! That's right: you'll be able to get insights into the activity of a public feed or track the history of a given package. Coming soon...

We hope you like it! Keep those suggestions and feedback coming! Who knows, maybe one day you'll see your wish granted :)

Happy packaging!

MyGet Update: version 1.3.0

We are happy to announce a new version of the MyGet Web site containing a set of improvements, fixes and updates.

The following work items have been addressed in this release:

Product

  • number of plans has been reduced. We now have Free, Starter (formerly Small plan), Professional (formerly Large plan) and a new Enterprise plan
  • introduction of the new Enterprise plan focused on companies that require more performance, integration and security

Usability

  • improved layout and readability on smaller screen resolutions
  • auto-lower cased routing for feed URL (helps avoiding issues caused by incorrect casing of feed identifier)
Performance
  • upgraded to MVC4
Features
  • new identity provider: StackExchange
  • upgraded SymbolSource integration to take benefit from their improved and freshly released API
  • Gallery feeds now support a README in Markdown format, available in the feed's Gallery Settings
  • Enterprise plan: upgraded administrative & quota management dashboard
More details about the new features will be covered in follow-up blog posts soon.
 
Happy packaging!

Planned maintenance July 26, 2012

We are planning a small window of maintenance on Thursday July 26, between 1:00pm and 2:00pm CET, during which we will roll out an update to the MyGet Web site.

During that timeframe, we will disable any synchronization we have in place with SymbolSource. Synchronization will pick up after our deployment.

We expect no other interruptions in service during this time frame, except for maybe a few seconds when switching VIPs.

Stay tuned!

Site issues on July 2nd, 2012

What a day! July 2nd seemed to go from sunshine to rain in a couple of hours. The good news is: we’re back. Let’s first go through what happened…

What happened?

imageThis morning, you've probably received an e-mail stating that "Your Free subscription at MyGet has expired". Free? Expired? Even we were puzzled! We’ve quickly sent out an e-mail stating free subscriptions don’t expire and that we were investigating the issue. In our rush to find out what happened, we even forgot to remove some template text from that e-mail. Professional of us, really.

A couple of hours later, the website started acting funny: one moment it was up, one moment it appeared down. The strange thing was: we did not see this happening in our server monitoring page. The brand new Windows Azure portal was all shiny and bright, all systems running. A short glitch? Probably.

And then 4:00 PM (GMT+1) came around: we started to receive SMS messages from Pingdom. UP, DOWN, UP, DOWN. Some sweaty hours later, we’ve found the root cause of all this mess and shortly thereafter, deployed a hotfix to the cloud. What a day!

All-in-all, we’ve spammed you in the morning, had some small glitches during the day and went bezerk between 4:00 PM (GMT+1) and 7:00 PM (GMT+1).

What was wrong?

During the entire day, we were suspecting two of our latest changes. First of all, we’ve moved storage accounts in Windows Azure to make use of some of the new features in there with regards to mirroring packages. The code isn’t done yet, but we did already switch the storage accounts. Next to that, we’ve recently deployed our site using ASP.NET MVC 4 (RC). Was any of this the cause? We’ve been searching… But no.

Elmah revealed some interesting details. We were getting throttled on those new storage accounts. Was this due to the fact that we enabled storage analytics? We’ve disabled them and found that we were back up in full glory! For a moment, because this wasn’t the reason we were being throttled…

We process several operations asynchronously, such as updating our caches. It seemed that there were 16.000 commands in there, all telling the system to update the cache for one specific feed. One of our newest MyGet members managed to break a couple of gears there! Something we don’t judge. In contrary: there’s no better stress test or user test than real users.

A rookie mistake

Now why were there 16.000 messages unprocessed and why were some of them on the dead-letter queue? And why were our machines looking fine (all green!) while the website did not respond? The answer lies in the following piece of code:

1 [DebuggerDisplay("{Title} Version={Version}")] 2 public class FeedPackage 3 : TableServiceEntity 4 { 5 public string Id { get; set; } 6 public string Version { get; set; } 7 8 // ... 9 10 public override bool Equals(object obj) 11 { 12 if (ReferenceEquals(null, obj)) return false; 13 if (ReferenceEquals(this, obj)) return true; 14 if (obj.GetType() != typeof(FeedPackage)) return false; 15 return GetHashCode() == obj.GetHashCode(); 16 } 17 18 public override int GetHashCode() 19 { 20 int hashCode = 7; 21 22 if (PartitionKey != null) hashCode ^= PartitionKey.GetHashCode(); 23 if (RowKey != null) hashCode ^= RowKey.GetHashCode(); 24 25 return hashCode; 26 } 27 }

Did you spot it? We’ve gone with a complete rookie implementation of Equals and GetHashCode there…

The devil is in the fact that we’re using the hashcode of package A and compare it with package B. That’s a serious problem, as those hash codes are based on the hashcodes of their package identifier and package version. Guess what MSDN tells us

If two string objects are equal, the GetHashCode method returns identical values. However, there is not a unique hash code value for each unique string value. Different strings can return the same hash code.

Our newest member was in the position where he had a lot of packages uploaded to MyGet and by coincidence, two packages appeared to be “equal” using our faulty code above.

Now why did the website choke on that?

Windows Azure table storage, which we use to store all your data, In the .NET space, the WIndows Azure storage client uses WCF Data Service to fetch data. Deep in that framework, the Equals'() method is called on every entity to check whether a specific entity is already being tracked by the data context in use. And since we had a duplicate result for Equals(), an unhandled exception was thrown there.

Wait a minute: you guys don’t catch unhandled exceptions? Yes we do. And when such exception occurs, we retry some operations for up to five times. Retry operations? Well, rather than failing immediately, a good pattern is to retry a failing operation to check that the error wasn’t just a minor glitch like a network fault. On Windows Azure, it is recommended to do this for calls to any external system, like a webserver calling storage. Check David Aiken’s blog for some info on this.

So, retries… Yes. Retry an I/O operation with minor latency for five times. Have a user who’s experiencing an issue with the website F5 a couple of times. And have that queue with 16.000 operations pending hammer this storage account again. What will happen? Throttling. Windows Azure tells our system to retry again after a second. And our retry logic does exactly that. This retry logic causes some threads to sleep, all the queue processing and F5-ing causes some more thread to sleep and what happens? The server comes to a halt while it still looks okay to our external monitoring.

Are we running just one server? No. We’re running multiple, in a round-robin load-balanced farm. This means that all this refreshing impacted all of our servers, making the site as slow as a snail. And recover again. And go almost down again…. and come up again. Exactly what we’ve been seeing today.

What are we doing to prevent this from happening in the future?

The truths of offering a cloud service are: hardware fails, software has bugs and people make mistakes. Hardware failures are mitigated by the Windows Azure system. Software bugs? It seems we’re proficient at that. And yes, that was a giant mistake of us. Our job is to mitigate all of these and provide a reliable, robust service to you. We’ll be using the lessons learned today to improve our service, your service.

First of all, we’ve fixed our rookie mistake. This should have never happened and we’ll make sure that our code base is checked and repaired for faulty Equals / GetHashCode implementations.

Next, we’ll be reviewing our retry logic. Blindly retrying on any exception type isn’t the smartest thing to do, so it seems. We’ll be making sure the retry logic only fires on transient exceptions and not blindly on any exception that occurs.

We’ll also be investigating additional monitoring. We’re not sure yet about possible tools or services there, we do want to know when something is wrong and our CPU is heating the entire datacenter at 100%.

We're sorry!

Happy packaging!

The MyGet team