This study deals with a subset of Croatian idiomatic expressions – similes – which follow the pattern adjective + kao/ko (‘as’) + noun (e.g. tvrd kao kamen lit. hard as stone ‘very hard’). The aim is to establish the criteria and procedures which can be used to identify conventionalized similes in a large corpus. A set of similes thus obtained may be used in dictionary-making and/or to create a lexical database. Furthermore, a dictionary and a rule- based grammar of similes were created in NooJ on the basis of the results of CQL queries in hrWaC. The grammar may be used for the automatic detection of similes in a large corpus as well as to identify other structural types of similes.