[LITMUS^RT] Hit crash in C-EDF (configured as partitioned)
Björn Brandenburg
bbb at mpi-sws.org
Tue Jan 29 12:24:37 CET 2013
On Jan 29, 2013, at 12:17 PM, Björn Brandenburg <bbb at mpi-sws.org> wrote:
> On Jan 29, 2013, at 3:44 AM, Glenn Elliott <gelliott at cs.unc.edu> wrote:
>
>> With the ECRTS crunch, I don't expect anyone to take a look at this, but I think I may have hit a bug with C-EDF when cluster size = 1 (partitioned). It is a pretty strange crash. Waking from sys_wait_for_ts_release() seems to lead to a crash in unlink(). The code is trying to remove the bheap node of the waiting task from the ready queue. The check on is_queued() is successful, but the remove() call fails in a NULL pointer dereference.
>
> I've hit the same bug. It's triggered by arming the timer for a job twice. I have a fix, but it doesn't apply cleanly to the mainline plugin (I've ripped out the rt_domain code for my version). I'll see if I can quickly extract an experimental patch.
I pushed a patch with the gist of the fix to wip-cedf-timer-bug.
NB: completely untested, but seems to compile…
- Björn
commit 4b4959670aafe17685d484b65ab2dc683dfb4740
Author: Bjoern Brandenburg <bbb at mpi-sws.org>
Date: Tue Jan 29 12:21:22 2013 +0100
C-EDF: don't arm timers from requeue
to avoid races / double arming of timers
Conflicts:
litmus/sched_cedf.c
diff --git a/litmus/sched_cedf.c b/litmus/sched_cedf.c
index b45b46f..89996cd 100644
--- a/litmus/sched_cedf.c
+++ b/litmus/sched_cedf.c
@@ -256,10 +256,8 @@ static noinline void requeue(struct task_struct* task)
if (is_released(task, litmus_clock()))
__add_ready(&cluster->domain, task);
- else {
- /* it has got to wait */
- add_release(&cluster->domain, task);
- }
+ else
+ TRACE_TASK(task, "not requeueing not-yet-released job\n");
}
#ifdef CONFIG_SCHED_CPU_AFFINITY
@@ -343,6 +341,9 @@ static void cedf_release_jobs(rt_domain_t* rt, struct bheap* tasks)
/* caller holds cedf_lock */
static noinline void job_completion(struct task_struct *t, int forced)
{
+ cedf_domain_t *cluster = task_cpu_cluster(t);
+ lt_t now;
+
BUG_ON(!t);
sched_trace_task_completion(t, forced);
@@ -353,14 +354,20 @@ static noinline void job_completion(struct task_struct *t, int forced)
tsk_rt(t)->completed = 1;
/* prepare for next period */
prepare_for_next_period(t);
- if (is_released(t, litmus_clock()))
+ now = litmus_clock();
+ if (is_released(t, now))
sched_trace_task_release(t);
/* unlink */
unlink(t);
/* requeue
* But don't requeue a blocking task. */
- if (is_running(t))
- cedf_job_arrival(t);
+ tsk_rt(t)->completed = 0;
+ if (is_running(t)) {
+ if (!is_released(t, now))
+ add_release(&cluster->domain, t);
+ else
+ cedf_job_arrival(t);
+ }
}
/* cedf_tick - this function is called for every local timer
More information about the litmus-dev
mailing list