develooper Front page | perl.beginners | Postings from September 2021

Regex to detect natural language fragment

Thread Next
Julius Hamilton
September 13, 2021 15:32
Regex to detect natural language fragment
Message ID:

I'm not sure if this is possible, and if it's not, I'll explore a better
way to do this.

I would like to write a script which analyzes if a line of text is (likely)
a broken natural language sentence, i.e., it is probably part of a
sentence, even if the start or end is not present, rather than it being a
fully "complete" linguistic entity, for example, a header of a section,
which does not have a period at the end and is not really a sentence, yet
is in a complete and unbroken form.

I'm pretty sure in principle this will require some kind of syntax parsing.
I think I read somewhere regular expressions for some mathematical reason
cannot parse tree / nested structures, for example HTML.

Does anyone know what some next most ubiquitous, standard tool is for
analyzing nested linguistic structures? Is that an XML parser?

Thanks very much,

Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About