The topic of crowd modeling in computer vision assumes a single generic typology of crowd, which is very simplis- tic. In this paper we adopt a widely accepted taxonomy for crowds, focusing on a particular category, the spectator crowd, which is formed by people "interested in watching something specific that they came to see" [6]. This can be found at the stadiums, amphitheaters, cinema, etc. In par- ticular, we propose a novel dataset, the Spectators Hockey (S-HOCK), which deals with hockey matches during an in- ternational tournament. The dataset considers 4 hockey matches, where hundreds of spectators are individually an- notated, capturing fine grained actions such as hands on hips, clapping hands, watching the cellphone etc., for a to- tal of more than 100 millions of annotations. Analyzing peo- ple at the stadium addresses different computer vision tasks, some of them are classic (crowd counting), while other are brand new (as the spectator categorization). For this reason, S-HOCK comes also with a set of protocols for dealing with all of them, and a set of baselines and novel approaches that define the best scores on all the tasks. Anyway, the perfor- mances are far from being errorless, and this witnesses the difficulty of the problem and that much can be done in the future.