Modern ABR, particularly when using fMP4 (what you call “chunked”, usually recommends (even requires) that you have separate segments for each stream (and therefore keep audio and video separate).
What is your use case for wanting to do this @vowave ?