Skip to content

Fix duplicate segments caused by PSEUDO subdirectories in ISLEDAT4

Guillaume Wisniewski requested to merge isle into main

The corpus contains SESS* directories both at ISLEDAT*/SESS* and nested under ISLEDAT4/PSEUDO/…/SESS*, causing each utterance to be parsed multiple times. Restrict session discovery to directories whose direct parent is an ISLEDAT* folder to avoid the duplicates.

Co-Authored-By: Claude Sonnet 4.6 noreply@anthropic.com

Merge request reports

Loading