The demonstration is easy: just some excerpts from the markup.
From the Perseus LSJ entry on μέν :
<bibl n="Perseus:abo:tlg,0016,001:3:67" default="NO" valid="yes">
Question: who or what is Id.? In computing terms, a reference or pointer. A reader of the LSJ will have noticed, or be able to find in the text, the term that is referenced. A DOM (Document Object Model) parser will not. Therefore any program using such a parser will not. The markup adds text that is superfluous for a person, and useless for a computer.
<bibl n="Perseus:abo:tlg,0012,002:1:392" default="NO" valid="yes">
Question: was Odysseus author of the Odyssey? No, but the markup seems to have forced a spurious attribution. LSJ omitted to list Homer as author, on the grounds that the Iliad and Odyssey need no authorial attribution. This is markup that is wrong (likely harmlessly so) for a human reader, but error-producing for a computer program.
Correct coding will solve these problems, though the solution requires abandoning the markup and looking for a computationally richer representation of lexicon entries. I'm going to do this in my next posts.