Telnet host disconnect

sceedy

Member
The story so far :-

I am experimenting with cross compiled programs for the Humax. I used telnet to have a “poke around” and saw there was a complex ecosystem so have stuck to static programs. I have uploaded to “My Photo” (where else do you put snapshots ?) and had no problem with simple programs (e.g. deliberately producing segment traps, illegal instruction traps, bus errors etc. to confirm errors did not interfere with recoding/playback occurring at the time).

When I started creating programs with a non-trivial execution time, I started to get disconnects. I read the telnet protocol and found it was the client that was responsible for keeping the link alive. HyperTerminal is bundled with xp, so I looked at it’s specs and it is incapable of doing anything to keep alive.

I searched the web for alternatives and putty was highly recommended for it’s keep alive capabilities as well as being listed as useful in the wiki. The web tutorials gave a tick box and a time (I set to 60 seconds: putty configuration is attached in case the solution is that I just need new glasses i.e. I missed something).

I started using putty and an alert came up saying that the host had disconnected. Next step uninstall third party firewall and disable windows firewall, no change. My trusty network analyser told me the disconnect was being sent to the PC. My network is done “the hard way” i.e. every device has it’s IP address set manually and are connected together by a 128 port switch. I wondered if the switch had developed a fault so I found a cross over cat 5 cable and connected directly, no change.

The telnet prompt is not the standard linux prompt so I wondered if telnet itself had limit(s)?

I have tried looking in the standard linux places for the limits and have not been able to find them. Any reasonable suggestions on a way forward will be welcome as I am doing this out of intellectual curiosity.
 

Attachments

  • Putty.zip
    2.3 KB · Views: 4
When I started creating programs with a non-trivial execution time, I started to get disconnects.
As an alternative to trying to keep a Telnet session alive, you could:

1. Use webshell https://hummy.tv/forum/threads/webshell-command-line-access-from-web-browser.6907/

2. Execute long processes within an abduco session https://hummy.tv/forum/threads/yout...other-video-platforms.8462/page-3#post-120326
abduco

A problem with running a long process from a Telnet session or the webshell package (a command terminal available as a web page - access via WebIF >> Diagnostics >> Command Line) is that the session inconveniently drops out and terminates any active processes if you take the focus away to do something else. Then you have to reconnect and restart the youtube-dl command, which then has to do its initial thinking before the download resumes...

The command abduco is already available to support other long processes such as fix-disk, and can be used here to create a protected command session which carries on regardless of the terminal session dropping out. So, before launching the main download, use the following to create a protected session:
Code:
# abduco -A yt
This creates a protected session called "yt" (call it what you like), and the next command prompt is within that session. Then start your youtube-dl process as described above. You can now "do other stuff" and the session will carry on regardless - to break out of the session and do other things on the command line, use Ctrl+\.

To come back later, use the same abduco line again. This time, as there already exists a session called "yt", the command connects to it rather than starting a new one.

To close the session (from within the session), type "exit".

To inspect open abduco sessions, type "abduco" (with no other parameters).
 
Last edited:
I regularly run long processes (>1hr) using putty without problems

You can compile many programs directly on the Humax without needing to resort to the cross complier.
But apart from some utilities most of the webif is written in Jim/TCL rather than complied

Why are you uploading to My Photos? Most of the webif and packages live in the /mod directory especially /mod/bin and /mod/webif and you can create your own directories under /mod
 
Status update :-

Black Hole’s suggestion of abduco allowed me to get to the root of my original problem. It let stuff run longer so I could determine that I was interfering with normal operation. Once I had the “doh” moment of “don’t think pc, think embedded”, I made my code “nice”r (as per the kernel call. Default priority is 10. I was thinking 18 or 19 but something is limiting priority to 17. Note to any non-technical reader, the lower the linux priority the more resources it gets i.e. the other way round to intuition). Once I had done this I no longer got telnet disconnects. I could turn off putty’s keep alive and even go back to HyperTerminal without any problems.

Extreme laziness on my part means that I have not confirmed this is also the case for webshell. Around 3 years ago I created 2 usb sticks, one labelled “custom firmware” and the other labelled “packages”. Every time I get a new box I fit a 4T drive, load the firmware and packages, soak test for a week and then ship to the next family member on the list (My experience is that approximately 19 out of 20 “spares or repair” purchases work fine with a new drive). When I tried to install webshell on a box with the old package set I got a missing dependency in the log (attached).

MartinLiddle’s question is probably relevant now. After 15 to 20 minutes of both cores at between 99.0% to 99.5% load, the front led pannel’s leds stop scrolling the channel left (twice) and then changes to “CRASH – Wait”, ”Reboot in 10s” … I have attached a putty log in case I have not turned on crash dump. On reboot the normal places only have logs that cover the time since reboot. I am not getting a crash dump so I believe I have upset the watchdog (given the led behaviour)! This is strengthened by the program running to completion on QEMU running mipsel debian etch (also linux kernel version 2.6.18) and producing the correct results. I have tried the obvious fix of “yield”ing on every iteration of the cpu hungry loops. The help I am after is the watchdog limits (I have probably exceeded the average cpu usage limit).

I have not ignored MymsMan’s comment. There is another thread about cross-compiling. I agree with the conclusion that you should compile on target if possible. If you need scripting tools not available on target, use the cross-compilation tools provided by Humax. Anything else is “interesting” (in the sense of the old Chinese curse “may you live in interesting times”). The code I wish to write needs fixes in gcc 9.2 and 9.3 fixes would be nice to have. Debian 11 cross compiler is “pencilled in” at gcc 9.3 or 10.1. I am experimenting on debian 10 (gcc 8.3.0) to try and identify the problems. First problem, the default compiler libraries are compiled for revision 2 of the mips32 architecture and the target processor is revision 1. Rebuilding the compiler libraries is time consuming but not difficult. Second problem, later gcc requires later glibc or ulibc. Debian 11 has “pencilled in” libc 2.32. Looking at the libc documentation, the minimum kernel version rational seems reasonable for the 64 bit ABI but not the 32 bit ABI, so I have been re-instating compatibility code from previous libc versions to produce a deviant (technical term for an unauthorised and unsupported variant) that is good enough to build gdb 9.2 (I don’t want to use “dwarf 4” as an expletive) and my program. I believe I have dealt with all the kernel calls but I may have made a mistake. Third problem is that debian 10 has libc 2.28, so you need libm from 2.28 for binary compatibility with precompiled libraries. If requested, I can publish (I only did static) recompiled compiler libraries, libc deviant sources and compiled version, recompiled libm, binaries for gdb 9.2 (stand alone and server) and the most recent “build from source” document. However, I would prefer to publish after solving all the target specific problems with my program.

In case the last paragraph confused things, I want information on the watchdog limits so I can avoid triggering the limits. I don’t want to just turn it off.
 

Attachments

  • opkg_install.log
    305 bytes · Views: 4
  • putty.txt
    2.2 KB · Views: 4
If you want to max out the box for multimedia processing, better use Maintenance mode. The settop program will need around 20% of the CPUs. Having said that, it's not an obvious target unless you can somehow interface to the built-in multimedia processor.

Code that relies on a specific compiler version sounds fragile to me.

Webshell wants the libutil package but that appears to replicate a library already in the CF for recent versions; perhaps the dependency is needed for older CF versions? Your missing dependency shouldn't matter.
 
I was looking at the post to apologise to BlackHole for confusing him. The watchdog is a generic linux problem so should not be on this forum as it is not custom firmware specific. Apparently I need to start by pulling config.gz (usually in /proc) …

The first Humax related Broadcom document I stumbled across (Document 7405-3HDM00-R “Preliminary Hardware Data Module”) refers to 7405-5HDM “CPU Interface” that should contain the mips co-processor 2 interface to the multimedia processor.

The code generation fixes in gcc 9.2 stops you needing to compile functions separately with no optimisation that would otherwise have the intended processing optimised out. The 9.3 fixes correctly optimise the problem constructs.

Thanks for identifying the dependency.

Edited 8 November 2020 in italics
 
Last edited:
Generic linux got me to the kernel is compiled “no way out” (as expected for an embedded system) and softdog (the default linux watchdog) is not running. Occam's razor left me with only things that the kernel detects itself as the reason for my crash.

I prototyped a load monitor for these things (attachment load.c.txt) and found 2 secondary mechanisms in glibc for enforcing the minimum kernel version, which I bypassed in the process of getting it going (involving reinstating approximately another 27%, by sloc, of the compatibility code found when bypassing the primary mechanism). This monitoring showed the load on the swap file reaching 100% at the 15 to 20 minute mark i.e. co-incident with “CRASH - Wait”… i.e. a complete coincidence that the symptoms resembled softdog detecting the 15 minute load limit being exceeded. This also explains why maintenance mode made no difference to the crash.

Searching the forum shows that a larger swap is needed for other things that use ffmpeg as the main processing engine. The quick scan read of the forum (problems related to youtube download) seemed to be the same as the generic linux stuff for a swap file. So I increased the size to accommodate the maximum I calculated the ffmpeg libraries need for transcoding and the confidence check ran (attachment finally.txt). (I think it is a good confidence check because the dancers make jerky and smooth movements in multiple directions, which pushes the motion vector encoding for the video, and the sharp edges in the audio also pushes the audio encoding.)

However the cure is currently worse that the disease (attachment newproblem.txt). When it powers on the larger swap file is present but not active. My understanding is, that as long as the swap file name has not changed, enlarging the swap file should not stop automatic use on power up. However timer recordings do not work correctly without swap space so having to power on manually and turn swap on manually sort of defeats the object of having a box that timer records. I am hopeing that somebody will point out what I need to do (because if you re-read something you have misunderstood then you will just misunderstand again).
 

Attachments

  • newproblem.txt
    2.2 KB · Views: 3
  • load.c.txt
    4.9 KB · Views: 2
  • finally.txt
    2.9 KB · Views: 2
This is a machine with 128MB of RAM. There's really no point having a page (aka swap) file of 9GB.

Access to paged-out memory is typically 10000 times slower than to data in RAM.

A program that needs more memory than can fit in some reasonable subset of RAM is always going to struggle.

Conceivably, if the program sequentially writes and reads back a gigantic chunk of memory, an unreasonably larger than normal page file may be useful. Otherwise, a point is easily reached where the system spends almost all its time moving data between the page file and RAM (thrashing).

These links may be relevant:

If you want to have a bigger page file by default, edit the value assigned to swapsize in /mod/etc/init.d/S00swapper, then
Code:
# assuming HDR
/mod/etc/init.d/S00swapper stop
rm -f /mnt/hd3/.swap0
/mod/etc/init.d/S00swapper start
(Modifications to the swapper package for the HD have been discussed in its earlier threads.)
 
This reminds me of one of my earliest experiences with computers. My school had just started running an 'O' level in computer science, and we had an acoustic-coupled phone line 110 baud teleprinter connection to the local electricity board's mainframe running compiled BASIC. However, during the summer holiday, I hand punched 80-column cards with a Fortran program to run a Fourier analysis on the sunspot data I had collected by observation, which I took to the technical college in town to run on their mini.

After a few iterations fixing syntax errors and resubmitting the stack of cards (handed over to the operators and told to come back in a few days), I was finally handed a printout which said my data had exceeded the available memory.
 
...The telnet prompt is not the standard linux prompt so I wondered if telnet itself had limit(s)?
...
Catching up, you see (if it hasn't been disabled) a menu with a CLI option that runs the system's POSIX shell /bin/sh, the ash from the CF build of busybox. All quite standard.
 
When you build ffmpeg, you also get ffmpeg_g, making it possible to look at the internals. In the initial development of my plugin, the memory usage seemed excessive i.e. libavcodec was creating memory as though it was going out of fashion. I set ffmpeg_g to do the same re-coding task and it also created a lot more memory than I would expect intuitively. In both cases memory was created and written to, and then between 8 seconds and several minutes later, it is read and freed. Therefore I have no concerns about threshing. Admittedly the checks were done on a pc but, with the default swap size, gdb gives an error message along the lines of out of memory on fork when the run command is executed on ffmpeg_g or my program compiled with -g.

I have two confidence checks, one for re-multiplexing and the other for re-coding. The re-multiplexing confidence check successfully processed a “rip” (actually the files created by the authoring process, but it is the same set) of the main title of a DL DVD with multiple audio languages with the default stack size. When I re-ran it with instrumentation it gave 99% swap load so I am not surprised that some of my release test cases for re-multiplexing also crashed my HDR.

The urgency has passed because I have just finished soak testing my latest re-conditioned box, so I have programmed it with my recording schedule and left my box to run the remainder of my release test cases.

Just to confirm I understand what /df has said, to recover my box I stop a process, delete the bloated swap file and restart a process.

(Not related to what /df said)After that I need to work out a way of co-existing with the process as it will be useless if you need to cripple the box to run my program even though, so far, it has not interfered with timer recording when running (I have not watched anything live yet). My confidence checks take approximately 20 minutes on my laptop (was mid range 3 years ago). The re-multiplexing confidence check takes approximately 30 minutes running native and the re-coding check takes approximately 1 hour 25 minutes, which is not so excessive that it would not be useful to have a native version.
 
...
Just to confirm I understand what /df has said, to recover my box I stop a process, delete the bloated swap file and restart a process.
...
The instructions I gave were for setting up a persistent page (aka swap) file of non-default size.

Certainly, if you can get your box to anything via telnet once it gets into a thrash, the stop and start commands without the rm will cause page file usage to be cleared.
 
I do not wish to be deliberately annoying. I have checked: page file, thrashing and allocating memory are now considered more “politically correct” (technically correct) than swap file, threshing and creating memory.

/df’s instructions allowed me to recover my box. The default size is what I wanted, so I omitted the edit step as the size in the script was already what I wanted.

If you do stuff for yourself, especially against the recommendations of the consensus, and you cripple your box then that is a known risk of going against the recommendations of people who have found out the hard way! Under most circumstances a bloated page file is unsafe (see /df’s entry above) so you have to be <your choice of expletive here> sure that it is necessary and safe.

I repeated the gdb data watch on the box. The behaviour is the same as on a pc (linux and windows versions) but with the timings expanded in proportion to the execution times. I have spent time watching live tv whilst timer recording with my program running, so I am pretty sure it is necessary to have a bloated page file to run my program.

I believe I understand why I crippled the box. The script identified by /df (correctly) tries to mitigate the concerns he raised, so it was (metaphorically speaking) pulling in one direction whilst I was manually pulling in the other hence the stalemate. So my program co-existing is, metaphorically speaking, a standing on one leg whilst rubbing your tummy and patting your head problem. I have attached swap.c.txt as this is the prototype for my page file solution. I considered including a function based on myswap source but I calculated it would take me about 2 weeks to build the extra libraries and required a lot of code to be included that will only be executed once irrespective of how many times my program is run, so I went for the hack. Intuitively there should be a swapoff at the end. With default parameters you always end up with code being executed in the page file so you get a demonstration of one of /df’s concerns. If you set the priority so the existing default page file is used first, it still happens about two thirds of the time. I have restarted release testing with a new binary (was about 37% through when the attached prototype did what I expected on 3 boxes) so I would appreciate any thoughts on my prototype.
 

Attachments

  • page.c.txt
    27.9 KB · Views: 5
I do not wish to be deliberately annoying. I have checked: page file, thrashing and allocating memory are now considered more “politically correct” (technically correct) than swap file, threshing and creating memory.
We're not going to worry about that, are we? I definitely won't.

Swap file - what's non-woke about that? Page file - synonymous. Threshing - never heard that one anyway, are you sure you have it the right way around? Ditto, allocating is the norm (programs don't/can't "create" memory).
 
Swap file - what's non-woke about that? Page file - synonymous.
FWIW the VAX series of computers running VAX/VMS or now OpenVMS have a pagefile [sic] and a swapfile [sic]. I can't immediately find confirmation of this but I guess that the page file is the place for virtual memory - a process needing more memory swaps out ( :rolleyes: ) an old page of information into the page file and reads/creates another page for processing. The swap file on the VAX appears to contain a whole process that has been "swapped out" for some reason. Therefore, in OpenVMS terminology, swap file and page file are not synonymous - they perform different functions.
Excess paging in and out of the page file has always been known (to me) as thrashing.
Memory has always been allocated - obvious in C with functions such a "malloc" and "calloc" - not so obvious with "new" in C++ and Java.
 
See eg https://wiki.vmssoftware.com/Swap_File. One "reason" is that the process has been idle for too long. But I have trouble understanding why, except possibly for historical reasons perhaps related to storage device performance, this distinction is made. Dave Cutler didn't put it in Windows NT (aka V++:M++:S++).
 
Back
Top