This article exposes the challenges faced by social scientists in the quantitative analysis of social identities measured through open-ended questions in large surveys. The apparent large diversity of responses enunciated demonstrates the complexity of self-identification, but it does not undermine the relevance of quantifying a latent social category. We discuss our approach to buil-ding a caste nomenclature from open-ended questions in the Indian Human Development Survey (2011-2012), focusing on Hindu households in Uttar Pradesh. We start by exposing the issues of such quantification, highlighting the colonial history with which it is strongly associated. Contrary to common belief, caste is far from being an uncontested institutionalized category and its statistical measure is highly criticized. Nonetheless, several arguments push for its quantification. We describe our classification algorithm based on network analysis, hierarchical and manual clustering. We then suggest assessing the relevance of our classification from three aspects in this foundational work. First, indicators of homogeneity show homogeneous categories. Second, ‘gold standard’ comparison evaluates the effectiveness of the nomenclature. Finally, criterion validity tests whether the caste categories reflect selective dimensions of socio-economic status and ritual status. In doing so, we show that our nomenclature in seven caste groups makes it possible to break with a one-dimensional hierarchical vision with which the caste social structure is often associated.